Re: [CODE4LIB] Library Juice - thoughts?
I've not taken any classes on LibraryJuice mainly because I find their course descriptions too thin. The Data Management course has a better description than most, but perhaps I've been spoiled by Coursera where I can see a syllabus, schedule, and materials before deciding to pay any fees. I'm wondering, those of you who have taken a LibraryJuice course, what attracted you to it and how did the experience match or differ from your expectations? Dave On Wed, Oct 28, 2015 at 2:58 AM, Folds, Dustywrote: > Yes, I concur with these comments. Just be aware of the time commitment > that will be involved. That's where I ran into problems, too. > > Dusty > > -- > Dusty Folds, MLIS > Information Literacy and Digital Learning Librarian > Assistant Professor > University of Montevallo > Carmichael Library > Station 6108 > Montevallo, AL 35115 > P: 205-665-6108 > F: 205-665-6112 > E: dfo...@montevallo.edu > > > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > REESE-HORNSBY, TWYLA > Sent: Tuesday, October 27, 2015 1:47 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Library Juice - thoughts? > > I started a course (Introduction to XML) through Library Juice a year > ago. I wasn't able to finish it due to some personal challenges but I > still have access to the archived class which is great. Like Patricia, I > found the content very useful but underestimated how much time I needed to > read and study the material. Four weeks goes fast! The instructor also > scheduled times to meet online for questions. > > I did have trouble getting used to the Moodle platform but I think it has > since been upgraded to be more user friendly. > > I am seriously considering taking another course in the near future. > > Best, > > Twyla Reese-Hornsby > Public Service Librarian | J. Ardis Bell Library Tarrant County College > Northeast Campus | Office: NLIB 2127A > 828 W. Harwood Rd. |Hurst, TX 76054 > 817-515-6365 | Fax: 817-515-6275 > twyla.reese-horn...@tccd.edu | www.tccd.edu > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Patricia Farnan > Sent: Monday, October 26, 2015 9:40 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Library Juice - thoughts? > > I recently did a course through Library Juice on PHP & APIs, and I found > it really useful and easy to follow (well, easy for my poor brain to > follow. I still had to re-read my notes and re-listen to certain parts of > each video, to really let things sink in). The instructor was very good at > staying in touch with students and interacting. > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > BWS Johnson > Sent: Tuesday, 27 October 2015 4:14 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Library Juice - thoughts? > > Salvete! > > I'm going to be exceedingly naughty in replying to this. I used to > teach a course on Koha for Rory, so obviously I'm heavily biased. > > I taught twice, and as a fringe perq, he let instructors take certain > courses gratis. > > I would say overall that you're in for a treat. When it first started > it was a small experimental thing. A lot of the students' experiences > varied widely by how much they participated and which instructor they > selected. Rory has gone out of his way over the years to solidify the > lineup so that you get a good instructor. Compared to my University, they > are WAY cheaper. They weren't as comprehensive as my University, but hey, > that would be a really high bar. Also, they're designed with someone that's > working full time in mind. > > > As far as I know, they're still using Moodle, so if you're familiar > with that platform, you'll be right at home. > > The time commitment will vary by course, as well. I bet that Rory > would give you your instructor's email in advance to feel things out and > see how heavy the workload might be. > > So yeah, go for it! > > Hope this helped, > Brooke > IMPORTANT: This e-mail and any attachments may be confidential. If you are > not the intended recipient you should not disclose, copy, disseminate or > otherwise use the information contained in it. If you have received this > e-mail in error, please notify us immediately by return e-mail and delete > or destroy the document. Confidential and legal privilege are not waived or > lost by reason of mistaken delivery to you. The University of Notre Dame > Australia is not responsible for any changes made to a document other than > those made by the University. Before opening or using attachments please > check them for viruses and defects. Our liability is limited to > re-supplying any affected attachments. >
[CODE4LIB] Thomson Reuters and Impact Factors
Hi all, If I wanted to subscribe to up-to-date impact factor information from Thomson Reuters, which product would I need to purchase (JCR, InCites, ESI, etc.) and is there a general ballpark for price? Thanks! Dave
Re: [CODE4LIB] Definitional Question
How many humanities scholars does it take to define digital humanities? Good question, to which there is no good answer, but rather just more questions... - Paul Spence, courtesy of http://whatisdigitalhumanities.com/ I can't help but think the definition of digital humanities is overthinking it. Scholarship is practiced by non-humanities disciplines such as the natural sciences as well as the humanities. Simply appending digital to it doesn't really clearly refer to anything except the vague notion that somewhere, somehow, computers and/or fingers play an important role. DL On Fri, Jul 3, 2015 at 4:13 AM, Nick Szydlowski nick.szydlow...@bc.edu wrote: I like Bryan's answer as well. I've heard a lot of comments and jokes about the difficulty of defining digital humanities; this site gives a different definition each time you refresh the page: http://whatisdigitalhumanities.com/ Nick Nick Szydlowski Digital Initiatives and Scholarly Communication Librarian Boston College Law School 617 552-4474 On Thu, Jul 2, 2015 at 3:04 PM, McAulay, Elizabeth emcau...@library.ucla.edu wrote: Bryan's answer is very well thought out and jibes with my understanding of this topic, too. From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Bryan Brown bjbr...@fsu.edu Sent: Thursday, July 02, 2015 11:49 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Definitional Question Hi Matt, I work in the Technology Digital Scholarship department of Florida State University Libraries, and I spent my first few months trying to come up answers to those exact questions. Here's what I came up with: Digital humanities is the act of doing humanities scholarship using research methods enabled by new technology. The archetypical digital humanities project in my mind is text mining. If you are coming up with humanities data and using data analysis tools on it, you are probably doing DH work (IMHO). Digital scholarship is the idea of DH, but extended outside of DH to all scholarship. How does new technology affect scholarship in psychology? biochemistry? law? A big problem that I see with digital scholarship is that I have yet to hear anyone outside of libraries or DH communities use it. The humanities havent always been so digital, so the term Digital Humanities is a semi-useful term to differentiate this specific form of research from more traditional methods. The digital prefix has less utility outside of humanities; science has always been pretty digital out of necessity and other fields have adopted digital methods as they go. I've heard librarians use the term e-science sometimes, and it reminds me of the term e-business back in the 90's but now almost all business is e-business so the term no longer makes much sense. Most scholarship these days is digital, which makes defining digital scholarship as something special a bit difficult. In our department we use digital scholarship to refer to parts of the scholarship process that are more technology-oriented where faculty might not be aware of general best practices. Data management, research metadata, altmetrics, web publishing and licensing are some areas that we try to focus on supporting faculty. We aren't a huge department and we're learning as we go, so discussing what digital scholarship means and how we can provide value to faculty members is a big point of discussion (although I'm sure we all have our own definitions and ideas). Just one person's opinion, I hope that doesn't confuse things further. -Bryan Brown On Thu, Jul 2, 2015 at 2:13 PM, Natalie Meyers natalie.mey...@nd.edu wrote: this title may be of interest : Defining Digital Humanities A Reader Edited by Melissa Terras, Julianne Nyhan and Edward Vanhoutte December 2013 978-1-4094-6963-6 $44.95 On Thu, Jul 2, 2015 at 1:58 PM, Matt Sherman matt.r.sher...@gmail.com wrote: Hi all, This is a bit more philosophical question which might only apply to a few people but I am trying to work out some definitions for my own edification. So for those in the digital scholarship and digital humanities subset I would be interested in getting some thoughts on these three questions: 1) How would you define digital scholarship? 2) How would you define digital humanities? 3) Are they the same thing and why or why not? Any thoughts are appreciated as I am trying to think through this myself. Matt Sherman -- *Natalie K. Meyers* *E-Research VecNet Digital Librarian* *Hesburgh Libraries* *University of Notre Dame* 1136A Hesburgh Library Notre Dame, IN 46556 *o:* 574-631-1546 *f:* 574-631-6772 *e: *natalie.mey...@nd.edu http://library.nd.edu/
Re: [CODE4LIB] hathitrust research center workset browser
If your *institutional* email address is not on their whitelist (not sure if it is limited to subscribing ones, they don't say) you cannot register using the signup form, instead you can only request an account by briefly explaining why you want one. Weird, because they'd have potentially learned more about me if they just let me put my gmail address in the signup form. I don't get it - can all users download public domain content? If they give me an account, will I be indistinguishable from a subscribing institution? If not, why the extra hoops? On Fri, May 29, 2015 at 1:51 AM, Eric Lease Morgan emor...@nd.edu wrote: On May 27, 2015, at 6:33 PM, Karen Coyle li...@kcoyle.net wrote: In my copious spare time I have hacked together a thing I’m calling the HathiTrust Research Center Workset Browser, a (fledgling) tool for doing “distant reading” against corpora from the HathiTrust. [0, 1] ... 'Want to give it a try? For a limited period of time, go to the HathiTrust Research Center Portal, create (refine or identify) a collection of personal interest, use the Algorithms tool to export the collection's rsync file, and send the file to me. I will feed the rsync file to the Browser, and then send you the URL pointing to the results. [0] introduction in a blog posting - http://ntrda.me/1FUGP2g [1] HTRC Workset Browser - http://bit.ly/workset-browser Eric, what happens if you access this from a non-HT institution? When I go to HT I am often unable to download public domain titles because they aren't available to members of the general public. The short answer is, “Nothing”. The long answer is… longer. The HathiTrust proper is accessible to anybody, but the downloading of public domain content is only available to subscribing institutions. On the other hand, the “Workset Browser” is designed to work off the HathiTrust Research Center Portal, not the HathiTrust proper. The Portal is located at http://sharc.hathitrust.org From there anybody can search the collection of public domain content, create collections, and apply various algorithms against collections. One of the algorithms is “create RSYNC file” which, in turn, allows you to download bunches o’ metadata describing the items in your collection. (There is also a “download as MARC” algorithm.) This rsync file is the root of the Workset Browser. Feed the Browser a rsync file, and the Browser will mirror content locally, index it, and generate reports describing the collection. Thank you for asking. Many people do not know there is a HathiTrust Research Center. — Eric Morgan
Re: [CODE4LIB] hathitrust research center workset browser
They just informed me I need a .edu address. Having trouble understanding the use of the term public domain here. On Mon, Jun 1, 2015, 9:58 PM Eric Lease Morgan emor...@nd.edu wrote: On Jun 1, 2015, at 4:33 AM, davesgonechina davesgonech...@gmail.com wrote: If your *institutional* email address is not on their whitelist (not sure if it is limited to subscribing ones, they don't say) you cannot register using the signup form, instead you can only request an account by briefly explaining why you want one. Weird, because they'd have potentially learned more about me if they just let me put my gmail address in the signup form. I don't get it - can all users download public domain content? If they give me an account, will I be indistinguishable from a subscribing institution? If not, why the extra hoops? Dave, you are the second person to bring this “white listing” issue to my attention. Bummer! Yes, apparently, unless your email address is a part of wider something or another, then you need to be authorized to use the Research Center. Weird! In my opinion, while the Research Center’s tools work, I believe the site suffers from usability issues. In any event, I have enhanced the auto-generated reports created by my “Browser”, and while they are very textual, I also believe they are insightful. For example, the complete works of: * William Ellery Channing - http://bit.ly/browser-channing-about * Jane Austen - http://bit.ly/browser-austen-about * Ralph Waldo Emerson - http://bit.ly/browser-emerson-about * Henry David Thoreau - http://bit.ly/browser-thoreau-about — Eric “Beginning To Suffer From ‘Creeping Featuritis’” Morgan
Re: [CODE4LIB] Library Hours
I contacted the group behind the Indiegogo campaign on Twitter: https://twitter.com/davesgonechina/status/596148115465371649 1. 1. *Caravan Studios* @*caravanstudios* https://twitter.com/caravanstudios May 2 https://twitter.com/caravanstudios/status/594226589631533056 Help us raise $10K to put #*libraries* https://twitter.com/hashtag/libraries?src=hash locations hours in #*Rangeapp* https://twitter.com/hashtag/Rangeapp?src=hash help youth find free #*summermeals* https://twitter.com/hashtag/summermeals?src=hash #*safeplaces* https://twitter.com/hashtag/safeplaces?src=hash http:// bit.ly/rangecampaign http://t.co/Pq9Nmi8nQT https://twitter.com/caravanstudios/status/594226589631533056 11 retweets 8 favorites 1. *davesgonechina* @*davesgonechina* https://twitter.com/davesgonechina @*caravanstudios* https://twitter.com/caravanstudios also, library hours change often, budgets get cut. Is $10K enuff 2 run regular scrapes for years, or is this a one-off? 0 retweets 0 favorites 11:02 AM - 7 May 2015 Tweet text Reply to @caravanstudios https://twitter.com/caravanstudios 1.*Caravan Studios* @*caravanstudios* https://twitter.com/caravanstudios 7h7 hours ago https://twitter.com/caravanstudios/status/596400357242077184 .@*davesgonechina* https://twitter.com/davesgonechina this is a one time push for this summer. We'll open up the system so librarians can update their own data next year. https://twitter.com/caravanstudios/status/596400357242077184 1 retweet 0 favorites 2. 3. *davesgonechina* @*davesgonechina* https://twitter.com/davesgonechina 3h3 hours ago https://twitter.com/davesgonechina/status/596461555710955520 @*caravanstudios* https://twitter.com/caravanstudios that presumes librarians have the bandwidth/inclination to update ur $10K DB. Just sayin. https://twitter.com/davesgonechina/status/596461555710955520 0 retweets 1 favorite On Thu, May 7, 2015 at 1:33 AM, Dan Scott deni...@gmail.com wrote: On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber ewg4x...@gmail.com wrote: +1 on the RDFa and schema.org. For those that don't know the library URL off-hand, it is much easier to find a library website by Googling than it is to go through the central university portal, and the hours will show up at the top of the page after having been harvested by search engines. Hi, so this is an area that I've done, and am doing, a fair bit of work. See http://stuff.coffeecode.net/2015/ola_white_hat_seo/#/1/10 for some fun slides from a presentation I gave in January at the Ontario Library Association SuperConference that show some ways data gets into Google/Yahoo/Bing and concludes that the OCLC Registry manually maintain yet another copy of your data elsewhere approach isn't working. (Hit s to get speaker notes). The rest of the presentation goes into depth on how to use RDFa to mark up a real library web page with location, contact info, opening hours, and event info. And I've posited that crawling library sites to pull single-sourced data (e.g. you update your website to provide updated hours to humans, and the machines automatically benefit) would be a much more effective, accurate, and usable approach than maintaining copies of the data in Google+, OCLC Registry, etc. We could produce results like http://cwrc.ca/rsc-src/ that stay accurate, rather than being one-off efforts that decay over time. (It would be great if the OCLC Registry had a crawl this URL option so that it could keep all of its data up-to-date and incentive libraries to publish the data in a machine-readable format such as RDFa + schema.org.) On the but that's technically challenging front, I tried pursuing some grant funding to produce templates for publishing that structured info in Drupal, Joomla, and other commonly used CMSs. Sadly, my application was recently denied, but that will only slow me down; I'm not going to give up on the goal. I have a paper in the works that will expand on the content of the presentation for those sites that have the ability (technical and administrative) to modify their own web pages. Sites running the Evergreen library system already generate a page for each of their libraries that contains this structured data (e.g. https://laurentian.concat.ca/eg/opac/library/OSUL), which is single sourced from the data that has to be maintained in the library system anyway. I'll happily acknowledge that getting search engines to harvest the right data is not easy, though: right now, for example, if you search for J.N. Desmarais Library it currently shows that the library is open 24 hours a day, which is completely false--probably maliciously submitted--information. *sigh* I've edited that info in the Google+ page
Re: [CODE4LIB] Protagonists
Hey thanks everybody, I've been too busy to dig into any of your suggestions but hugely appreciated. This group is awesome. @Amanda, I actually remember signing up for Small Demons in beta and it died before I got a chance to really explore it. @Thomas, LibraryThing's charactername field looks very promising if the list consistently gives main characters first billing. @Shaun Trajectory is definitely interesting, though I've not thought of a use case yet. @Karen true about the authority problem - unless publishers wrap this sort of info in ebook metadata? @Joshua Like LibraryThing, its unclear if the character lists are actually prioritized by significance. @Joel Shame those resources look rather dusty. As for an IMDB for books, I think LibraryThing or Amazon are better positioned than anyone. @Brooke I'm absolutely certain its doable, but as @Amy points out its a pain in the ass. Even if I simply take @Alexander's suggestion of the Le Monde list, I have to scrape and scan and scrub for something that, in a world where we can have nice things, this already exists in a rough-and-ready incomplete but off-the-shelf dataset. It kinda blows my mind it doesn't. Not to mention there's the other step I mentioned, which is matching them up with Gutenberg.org pages. I'll keep you guys updated as I dig into all your ideas. Cheers! Dave On Wed, Apr 15, 2015 at 4:17 AM, Thomas Guignard thomas.guign...@gmail.com wrote: The LibraryThing API could also be used to retrieve what they call Common Knowledge tags, including character names but also place names etc. Example: https://www.librarything.com/services/rest/1.1/?method=librarything.ck.getworkid=2773690apikey=d231aa37c9b4f5d304a60a3d0ad1dad4 (using the example API key) Look for the characternames field. As far as I can tell, however, there is no way to determine which of the characters are the lead male and lead female character short of assuming that the top listed characters are in effect the lead ones. Also, the API calls are limited to 1000 a day. But maybe an avenue to consider. t. On Tue, Apr 14, 2015 at 2:15 PM, Shaun Ellis sha...@princeton.edu wrote: Another interesting startup in this area is Trajectory. Here's a list of Classics/Fiction via their JSON API (doc=isbn): http://api.trajectory.com/api/v1/search/?q=c=Fiction%20%2F% 20Classicslimit=568 Here's a human readable view: http://www.trajectory.com/search/?q=facetsc=Fiction% 20%2F%20Classicslimit=568 -Shaun On 4/14/15 11:07 AM, Amanda French wrote: What you *did* need for this interesting project was Small Demons, which was a for-profit company that was creating linked data from books -- here's an article about it: http://www.theverge.com/2013/ 3/1/4043298/building-an-atlas-for-books-with-small-demons But it shut down in 2013, and I have no idea what happened to the data. It might all have been commercial and proprietary, anyway. Article on its closure: http://www.latimes.com/books/jacketcopy/la-et-jc-small- demons-to-close-unless-buyer-appears-20131106-story.html Amanda On 4/13/15 10:12 PM, davesgonechina wrote: So I have this idea I'd like to do for a hobby project, but it requires finding a table that lists a classic novel, a Gutenberg.org link to an snip
[CODE4LIB] Protagonists
So I have this idea I'd like to do for a hobby project, but it requires finding a table that lists a classic novel, a Gutenberg.org link to an instance of that work (first listed, one with most downloads, whichever), the lead female character, and the lead male character (can be null). E.g. Pride and Prejudice, http://www.gutenberg.org/ebooks/42671, Elizabeth Bennet, Mr. Darcy. Even leaving the Gutenberg part for another day, this has been really difficult to find. I've had no success with Dbpedia/Wikidata since there's no real standardized format for novels, characters often are associated more strongly with films or video games than original works (Cheshire Cat), and when characters are listed they are neither prioritized nor link to a record that clearly states gender. And then there's how to select some sort of Western Canon list. ISBNs are nowhere to be found, nor any other identifier that might help to corral a fair chunk of results. I looked at OCLC, but WorldCat Works is still an experiment and frankly looks like too much work to query for too little return even if it had good coverage. Amazon? Librarything? Goodreads? No luck yet. I raise this partly because a) I would like to make some toys with that list, and b) I feel this is a good test case for what developers might want from library data, linked or otherwise. It is the sort of request that includes many unspoken assumptions (that there is a canon, and it is well-defined) that app users, product managers, and developers typically want even if it is woefully incomplete or imperfect, so long as it matches expectations. While I appreciate what it takes to make such a list, I feel like this really ought to be a solved problem in the library space. Not in the process of being solved, hopefully, by new emerging standards solved, but like we solved this ages ago, here ya go solved. I'm posting this basically in the hopes that someone will say No, doofus, there's an easy way to do this, you just aren't very good at this - look: and show me where I'm wrong. D
Re: [CODE4LIB] Data Lifecycle Tracking Documentation Tools
@John - Thanks, I'd be interested to learn more about the supportable pattern you mentioned if there are any readings you'd recommend. @Joe - Cheers, Andreas Rauber's presentation sounds particularly relevant. Do you have a link? @Colin - Thanks for the feedback, I do plan to take a closer look at JIRA. Dave On Fri, Mar 13, 2015 at 11:49 PM, Joe Hourcle onei...@grace.nascom.nasa.gov wrote: On Wed, 11 Mar 2015, davesgonechina wrote: Hi John, Good question - we're taking in XLS, CSV, JSON, XML, and on a bad day PDF of varying file sizes, each requiring different transformation and audit strategies, on both regular and irregular schedules. New batches often feature schema changes requiring modification to ingest procedures, which we're trying to automate as much as possible but obviously require a human chaperone. Mediawiki is our default choice at the moment, but then I would still be looking for a good workflow management model for the structure of the wiki, especially since in my experience wikis are often a graveyard for the best intentions. A few places that you might try asking this question again, to see if you can find a solution that better answers your question: The American Society for Information Science Technology's Research Data Access Preservation group. It has a lot of librarians archivists in it, as well as people from various research disiplines: http://mail.asis.org/mailman/listinfo/rdap http://www.asis.org/rdap/ ... The Research Data Alliance has a number of groups that might be relevant. Here are a few that I suspect are the best fit: Libraries for Research Data IG https://rd-alliance.org/groups/libraries-research-data.html Reproducibility IG https://rd-alliance.org/groups/reproducibility-ig.html Research Data Provenance IG https://rd-alliance.org/groups/research-data-provenance.html Data Citation WG (as this fits into their 'dynamic data' problem) https://rd-alliance.org/groups/data-citation-wg.html ('IG' is 'Interest Group', which are long-lived. 'WG' is 'Working Group' which are formed to solve a specific problem and then disband) The group 'Publishing Data Workflows' might seem to be appropriate but it's actually 'Workflows for Publishing Data' not 'Publishing of Data Workflows' (which falls under 'Data Provenance' and 'Data Citation') There was a presentation at the meeting earlier this week by Andreas Rauber in the Data Citation group on workflows using git or SQL databases to be able to track appending or modification for CSV and similar ASCII files. ... Also, I would consider this to be on-topic for Stack Exchange's Open Data site (and I'm one of the moderators for the site): http://opendata.stackexchange.com/ -Joe On Tue, Mar 10, 2015 at 8:10 PM, Scancella, John j...@loc.gov wrote: Dave, How are you getting the metadata streams? Are they actual stream objects, or files, or database dumps, etc? As for the tools, I have used a number of the ones you listed below. I personally prefer JIRA (and it is free for non-profit). If you are ok if editing in wiki syntax I would recommend mediaWiki (it is what powers Wikipedia). You could also take a look at continuous deployment technologies like Virtual Machines (virtualbox), linux containers (docker), and rapid deployment tools (ansible, salt). Of course if you are doing lots of code changes you will want to test all of this continually (Jenkins). John Scancella Library of Congress, OSI -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of davesgonechina Sent: Tuesday, March 10, 2015 6:05 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Data Lifecycle Tracking Documentation Tools Hi all, One of my projects involves harvesting, cleaning and transforming steady streams of metadata from numerous publishers. It's an infinite loop but every cycle can be a little bit or significantly different. Many issue tracking tools are designed for a linear progression that ends in deployment, not a circular workflow, and I've not hit upon a tool or use strategy that really fits. The best illustration I've found so far of the type of workflow I'm talking about is the DCC Curation Lifecycle Model http://www.dcc.ac.uk/sites/default/files/documents/ publications/DCCLifecycle.pdf . Here are some things I've tried or thought about trying: - Git comments - Github Issues - MySQL comments - Bash script logs - JIRA - Trac - Trello - Wiki - Unfuddle - Redmine - Zendesk - Request Tracker - Basecamp - Asana Thoughts? Dave
[CODE4LIB] Data Lifecycle Tracking Documentation Tools
Hi all, One of my projects involves harvesting, cleaning and transforming steady streams of metadata from numerous publishers. It's an infinite loop but every cycle can be a little bit or significantly different. Many issue tracking tools are designed for a linear progression that ends in deployment, not a circular workflow, and I've not hit upon a tool or use strategy that really fits. The best illustration I've found so far of the type of workflow I'm talking about is the DCC Curation Lifecycle Model http://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf . Here are some things I've tried or thought about trying: - Git comments - Github Issues - MySQL comments - Bash script logs - JIRA - Trac - Trello - Wiki - Unfuddle - Redmine - Zendesk - Request Tracker - Basecamp - Asana Thoughts? Dave
Re: [CODE4LIB] Data Lifecycle Tracking Documentation Tools
Hi John, Good question - we're taking in XLS, CSV, JSON, XML, and on a bad day PDF of varying file sizes, each requiring different transformation and audit strategies, on both regular and irregular schedules. New batches often feature schema changes requiring modification to ingest procedures, which we're trying to automate as much as possible but obviously require a human chaperone. Mediawiki is our default choice at the moment, but then I would still be looking for a good workflow management model for the structure of the wiki, especially since in my experience wikis are often a graveyard for the best intentions. Dave On Tue, Mar 10, 2015 at 8:10 PM, Scancella, John j...@loc.gov wrote: Dave, How are you getting the metadata streams? Are they actual stream objects, or files, or database dumps, etc? As for the tools, I have used a number of the ones you listed below. I personally prefer JIRA (and it is free for non-profit). If you are ok if editing in wiki syntax I would recommend mediaWiki (it is what powers Wikipedia). You could also take a look at continuous deployment technologies like Virtual Machines (virtualbox), linux containers (docker), and rapid deployment tools (ansible, salt). Of course if you are doing lots of code changes you will want to test all of this continually (Jenkins). John Scancella Library of Congress, OSI -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of davesgonechina Sent: Tuesday, March 10, 2015 6:05 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Data Lifecycle Tracking Documentation Tools Hi all, One of my projects involves harvesting, cleaning and transforming steady streams of metadata from numerous publishers. It's an infinite loop but every cycle can be a little bit or significantly different. Many issue tracking tools are designed for a linear progression that ends in deployment, not a circular workflow, and I've not hit upon a tool or use strategy that really fits. The best illustration I've found so far of the type of workflow I'm talking about is the DCC Curation Lifecycle Model http://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf . Here are some things I've tried or thought about trying: - Git comments - Github Issues - MySQL comments - Bash script logs - JIRA - Trac - Trello - Wiki - Unfuddle - Redmine - Zendesk - Request Tracker - Basecamp - Asana Thoughts? Dave
Re: [CODE4LIB] Streaming Copyrighted material
Hi all, Agreed with Brent regarding a cease and desist order coming long before any legal action, and agreed with Simon that under Aereo, and previous decisions, streaming is a performance and not distribution. FWIW I'm fairly certain that between the educational exemption for performance and display (Section 110(1)) and the fair use test (Section 107) libraries are on solid ground when streaming video through a third-party app (aren't you always through a browser?). The portion of the video saved temporarily on your local device is not infringement - the Cablevision decision carved out space for that and I don't believe Aereo changed that. Aereo was more about how transparently and frankly cynically they were adhering to the letter and not the spirit of the law to get around transmission fees, not whether caching is an act of piracy. If Kodi is simply providing a platform for accessing streaming content that is either made free on the web or for which the library has purchased an appropriate license/subscription (i.e. institutional not individual use), using Kodi or XBMC or some other tool as a discovery tool seems non-problematic. Dave On Mon, Dec 8, 2014 at 2:30 AM, Brent Hanner behan...@mediumaevum.com wrote: Sorry this took so long but been having a bunch of computer problems. Instead of trying to reply to bits of this I’m going to try to be more comprehensive. First thing is to understand a few things. The streaming aspect is far less important than where you are transfering it from and to. You have far more flexibility within the building then you do publicly over the internet. Just as an individual for personal use has more flexibility than a public corporation. This is where Areo tried to slide in and the Court disagreed with them. And libraries tend to fall somewhere in there having special exemptions to copyright granted by Congress but the laws don’t cover modern technical details. As long as you act in good faith you or your library will not get sued for two reasons. Firstly standard operating procedure is to send a cease and desist letter. So if you do skirt the limits realize it can happen and comply and then tell us what you did and what it said so the broader library community can decide where they stand. Secondly one of the last things a major content company wants is to sue a library. One thing that was clearly shown during surveys of people over the last few years is that while lots of people don’t actively use their library the public support for them is still very high. Thirdly they don’t want to sue a library because if they lose every library in the country will know what it can and cannot implement. And if they win they will face a legislative fight to expand what libraries can do. They are served far better by there not being clear rules, especially because librarians fear far more than they should. Part of this sort of thing in the long run is about managing bandwidth, with streaming video sucking up more and more bandwidth finding ways of controlling it will be useful. Luckily Netflix has been working on an appliance to help everyone with this but I’d imagine it will be a few years before it gets down to a library level unless someone comes up with a completely open source solution we can implement ourselves. Someone mentioned network TV which brings up the really interesting space. There is an argument to be made that providing access to access to content freely available to the public. While you clearly could not stream it to other locations the software and hardware is readily available. So the question is does anyone know of any court cases or LOC/copyright guidelines from back in the days of VCR about libraries recording shows on video tape and providing access to those tapes. The other thing to consider as a community is developing a catalog of videos that would be good to keep on servers in libraries that can be downloaded so they are more readily accessible without killing the libraries bandwidth. Brent Sent from Windows Mail From: Cornel Darden Jr. Sent: Tuesday, December 2, 2014 8:59 PM To: CODE4LIB@LISTSERV.ND.EDU Hello, Is streaming (viewing online) copyrighted material illegal for individuals. According to the copyright.gov website this seems to be completely legal for the viewer when there isn't a copy of the work on the viewers computer. It only mentions hosting streams as being a misdemeanor, even if there isn't any profit. This is becoming a huge issue as more content consumers become cord cutters. Has any librarians faced these questions? I am planning on implementing Kodi in my library, but will only make public domain material accessible. Kodi provides an excellent user interface for organizing and viewing public domain material. Thanks, Cornel Darden Jr. MSLIS Library Department Chair South Suburban College
Re: [CODE4LIB] Anybody using pinboard?
I like the platform, but I think I really paid for Maciej's wit. http://idlewords.com/bt14.htm On Thu, Nov 20, 2014 at 10:27 PM, Rogan Hamby rogan.ha...@yclibrary.net wrote: I've been using it since fairly early days. I like it but don't get exceptionally fancy beyond my own esoteric taxonomy for defining my bookmarks. On Thu, Nov 20, 2014 at 9:19 AM, Daniel Lovins daniel.lov...@nyu.edu wrote: I've been using it for years as a personal bookmarking tool, and thinks it's excellent. Jason may be doing more complex things with it, though. - Daniel. On Thu, Nov 20, 2014 at 9:11 AM, Brad Coffield bcoffield.libr...@gmail.com wrote: https://pinboard.in/ First saw this in a webinar led by Jason Clark and thought it was cool. Thinking about it again and feel like I should do it. But I'm worried it's just my tendency to want it because its something neato. Anybody using it and recommend it? (or signed up and regret it?) I already work evernote hard so I'm wondering if it's useful enough separate from that. Thanks! -- Brad Coffield, MLIS Assistant Information and Web Services Librarian Saint Francis University 814-472-3315 bcoffi...@francis.edu -- Daniel Lovins Head of Knowledge Access, Design Development Knowledge Access Resource Management Services New York University, Division of Libraries 20 Cooper Square, 3rd floor New York, NY 10003-7112 daniel.lov...@nyu.edu 212-998-2489 -- Rogan Hamby, MLS, CCNP, MIA Managers Headquarters Library and Reference Services, York County Library System “You can never get a cup of tea large enough or a book long enough to suit me.” ― C.S. Lewis http://www.goodreads.com/author/show/1069006.C_S_Lewis
[CODE4LIB] International CODEN Service
Does anyone use it, and how? Also, how much? Dave Lyons
Re: [CODE4LIB] 'automation' tools
+1 to OpenRefine. Some extensions, like RDF Refine http://refine.deri.ie/, currently only work with the old Google Refine (still available here https://code.google.com/p/google-refine/). There's a good deal of interesting projects for OpenRefine on GitHub and GitHub Gist. Google Docs Spreadsheets also has a surprising amount of functionality, such as importXML if you're willing to get your hands dirty with regular expressions. Dave On Tue, Jul 8, 2014 at 3:12 AM, Tillman, Ruth K. (GSFC-272.0)[CADENCE GROUP ASSOC] ruth.k.till...@nasa.gov wrote: Definite cosign on Open Refine. It's intuitive and spreadsheet-like enough that a lot of people can understand it. You can do anything from standardizing state names you get from a patron form to normalizing metadata keywords for a database, so I think it'd be useful even for non-techies. Ruth Kitchin Tillman Metadata Librarian, Cadence Group NASA Goddard Space Flight Center Library, Code 272 Greenbelt, MD 20771 Goddard Library Repository: http://gsfcir.gsfc.nasa.gov/ 301.286.6246 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Terry Brady Sent: Monday, July 07, 2014 1:35 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] 'automation' tools I learned about Open Refine http://openrefine.org/ at the Code4Lib conference, and it looks like it would be a great tool for normalizing data. I worked on a few projects in the past in which this would have been very helpful.
Re: [CODE4LIB] Web Therapy full-day preconference at ALA Annual
I can't help but point out that the examples for Web Therapy are mostly organizational and not Web-specific problems. - “Our summer reading guides are totally out of control! How do we reign them in?” - “I was put in charge of our cataloging when a colleague left the organization. How do I find time to manage it well in addition to my normal work?” - “Our Board wants patron statistics, but I don’t know what reports to provide in a way that will make sense and tell them objectively what they want to know. Help!” - “Campus security won’t let us install book drops; how are we supposed to provide a convenient way to return items for our students and faculty?” Maybe not the best substitutions, but I read the originals as issues with staff coordination, role management, internal reporting, and inter-departmental conflicts. None of them, except maybe the LibGuides, is really a technical problem, and I'm wondering if this panel will actually be a forum for talking about organizational dysfunction that often results when new technologies are integrated rather than discussing the technologies themselves. Dave On Thu, Jun 5, 2014 at 11:55 PM, McHale, Nina nina.mch...@rrcc.edu wrote: **apologies for cross-posting** Do any of these scenarios sound painfully familiar to you? “Our LibGuides are totally out of control! How do we reign them in?” “I was put in charge of our Drupal site when a colleague left the organization. How do I find time to manage it well in addition to my normal work?” “Our Board wants web statistics, but I don’t know what reports to provide in a way that will make sense and tell them objectively what they want to know. Help!” “Campus IT won’t let us install a CMS; how are we supposed to develop a robust library web site for our students and faculty?” Take comfort in knowing that you are not alone! If you are headed to Las Vegas for ALA in June, come join Chris Evjy (Jefferson County Public Library) and Nina McHale (Red Rocks Community College) and others who work on or manage library web sites for some Web Therapy! Bring your web woes to the table--specific topics will be determined by a survey sent in advance to attendees--and we’ll put our 20+ years of combined experience managing public, academic, and special library web sites to work to develop solutions. This is a great opportunity to work through complex issues in a small group setting. Free hugs! To register: * Register onlinehttp://ala14.ala.org/register-now through June 20 * Call ALA Registration at 1 (800) 974-3084 * Onsite registration will also be accepted in Las Vegas. Nina McHale, MA, MA/MSLS Library Director Red Rocks Community College Buckels Library 13300 W. 6th Ave. Lakewood, CO 80228-1255 303.914.6747 http://rrcc.colibraries.orghttp://rrcc.colibraries.org/ nina.mch...@rrcc.edumailto:nina.mch...@rrcc.edu
Re: [CODE4LIB] convert MODS XML into CSV or tab-delimted text
LoC has XSLT stylesheets to convert MODS to DC, HTML, and MARCXML. http://www.loc.gov/standards/mods/mods-conversions.html There are also XML to CSV XSLT scripts out here, and there's this app which I tested on a MODS 3.0 record and it didn't look too bad: https://code.google.com/p/xml2csv-conv/ On Wed, Apr 23, 2014 at 5:04 AM, Bryan Baldus bryan.bal...@quality-books.com wrote: On Tuesday, April 22, 2014 1:36 PM, Eben English wrote: Does anyone out there have an XSL stylesheet to transform MODS XML into a CSV or tab-delimited text file? Even if it's highly localized to your own institution/project, it would probably still be useful. I'm not sure how well it would work, but MarcEdit [1] has a MODS=MARC XML conversion option, and an option to Export Tab Delimited Records. [1] http://marcedit.reeset.net/ I hope this helps, Bryan Baldus Senior Cataloger Quality Books Inc. The Best of America's Independent Presses 1-800-323-4241x402 bryan.bal...@quality-books.com eij...@cpan.org http://home.comcast.net/~eijabb/
Re: [CODE4LIB] LibGuides: I don't get it
You guys are awesome, this is great stuff, really helpful. My impression of libguides has been fairly negative for many of the reasons mentioned, but Sean has a good point about content strategy and training, and Wilhemina has a good point about the costs of open source not always being appreciated. Has anyone tried the two platforms Andrew Darby mentioned, SubjectsPlus and Library a la Carte? That's the sort of thing I've been looking for but never found until now. Dave On Mon, Aug 12, 2013 at 9:57 PM, Sean Hannan shan...@jhu.edu wrote: Again, this not a technical issue. It's a content strategy issue. Believe me, I was where you were. I was using all kinds of javascript and CSS hacks to try to prevent people from getting creative with color. I was getting to the point of setting up Capybara tests to run against the guides to alert me to abusive uses of bold and italics. The folks creating guides are content people, not web people. Take the web out of it. Focus on the content. Pick a couple heuristics to educate them on (we picked 7 +/- 2, above the fold/below the fold, and F-shaped reading patterns). Above all, show them statistics. And not the built-in LibGuides stats, either. New vs. returning. Average time on page. Pageviews over the course of a year. Very, very, very quickly our librarians realized what content is important, what content is superfluous, and that the time the spend carefully manicuring and maintaining their guides would (and could) be better spent elsewhere. -Sean On 8/12/13 9:35 AM, Joshua Welker wel...@ucmo.edu wrote: I just have to say I have been thinking the exact same thing about LibGuides for the two years I've been using it. I feel vindicated knowing others feel the same way. At UCMO, we will be migrating to Drupal in the next several months, and I am hoping very much that I can convince people to use less LibGuides. LibGuides is great in its ease of use, but fails on just about every design principle I can think of. There have been several studies on tab blindness in LibGuides, and don't get me started on the sub-tab links that are hiding and require the user to mouse over a tab to even see what is there. I've tried telling people so many times to have just a few tabs and always to use a table of contents for the main page, but they rarely do. And it becomes just about impossible to have a consistent look and feel across your website when LibGuides allows guide creators to modify every element on the page as they see fit. People will do crazy things like putting page content in a sidebar element, something you'd never ever ever see on any website on the Internet. I tried to enforce uniform colors and column sizes across all the guides, but I was told to let it go because my coworkers wanted to be able to decide those things on a guide-by-guide basis. I've worked at two institutions that use LibGuides, and what inevitably happens is that librarians create one Uber Guide for entire subject areas (biology, religion, etc) and then create sub-pages for all the dozens of specific disciplines within those subject areas. And then, assuming the user somehow manages to find these pages, they are typically not much more than a list of links that could have easily been included on the main library website. Okay, sorry for the rant. It has been building up for several years and never had a chance to voice out. Josh Welker Information Technology Librarian James C. Kirkpatrick Library University of Central Missouri Warrensburg, MO 64093 JCKL 2260 660.543.8022 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Robert Sebek Sent: Sunday, August 11, 2013 11:21 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] LibGuides: I don't get it On Sun, Aug 11, 2013 at 9:54 AM, Heather Rayl 23e...@gmail.com wrote: I have to say that I loathe LibGuides. My library makes extensive use of them, too. Need a web solution? The first thing out of someone's mouth is Let's put it in a LibGuide! Shudder This fall, I'll be moving our main site over to Drupal, and I'm hoping that eventually I can convince people to re-invent their LibGuides there. I can use the saving money card, and the content silos are bad card and *maybe* I will be successful. Anyone fought this particular battle before? ~heather I'm fighting that battle right now. We have an excellent CMS into which I have set up all our database URLs, descriptions, etc.Anytime we need to refer to a database on a page, we use one of those entries. That database just changed platforms? No problem. I change the URL in one place and everything automatically updates (hooray CMSs!). All of our subject guides (http://www.lib.vt.edu/subject-guides/) are in the CMS using the exact same database entries. I converted from our
[CODE4LIB] LibGuides: I don't get it
I've not had an opportunity to use LibGuides, but I've seen a few and read the features list on the SpringShare. All I see is a less flexible WordPress at a higher price point. What advantages am I not seeing? If there aren't any, is it the case that once signed up, migration to an open source platform is just not worth it for most institutions?
Re: [CODE4LIB] Schema for Continuing (web) Resources
Sorry for taking a while to respond Matt, busy week. Initially the resources would be journals, databases, galleries, digital collections, language learning tools, dictionaries, statistical yearbooks, and similar online resources for China Studies. I have a Pinboard list for the sorts of things I plan to add: http://pinboard.in/u:davesgonechina/t:zongmu/ The goal is to have a curated collection of links to collections (not crawl every item, that can be a later project), with faceted search so that users can narrow down on resources of a particular format, time period, geographic region, etc. Dave On Sat, Jul 27, 2013 at 3:03 AM, Matthew Sherman matt.r.sher...@gmail.comwrote: Just to move your discussion along a bit, plus I think it sounds pretty interesting, what sort of resources are you talking about. Know what you are working with can give everyone a better idea on what schema's would work best. I know MARC is not so friendly for online resources, but it depends on what the item is. Just off the cuff Dublin Core is probably your best bet due it is extensiblity, but again depends what you are working with. Matt On Fri, Jul 26, 2013 at 10:10 AM, davesgonechina davesgonech...@gmail.com wrote: I'm trying to develop a curated site listing online resources for China scholars. Ideally I'd like to use a metadata schema that other libraries export as MARC, DC, or other standards they may use, and maybe also linked data-capable. Any suggestions? I'm experimenting with Drupal but my platform choice will probably be driven by my schema. Dave
[CODE4LIB] Schema for Continuing (web) Resources
I'm trying to develop a curated site listing online resources for China scholars. Ideally I'd like to use a metadata schema that other libraries export as MARC, DC, or other standards they may use, and maybe also linked data-capable. Any suggestions? I'm experimenting with Drupal but my platform choice will probably be driven by my schema. Dave
Re: [CODE4LIB] Libraries and IT Innovation
Some thoughts. BTW, new to the list - librarian working for a study-abroad program in Beijing here, building a new catalog with Koha these days and previously did competitive intelligence for investors looking at China's IT industries. I appreciate Matt trying to start an open-ended conversation about innovation and thought I'd toss my own rant in the ring. One of the things that really struck me about libraries when studying for my MLIS was how much library systems were designed primarily for the backend and not consumer-facing until post-Internet, and built and maintained by third parties that aren't practicing or even trained librarians (and charging a pretty penny for it). There's a lot of catch up going on by a profession that outsourced these skill sets and is now rebuilding through groups like CODE4LIB, hence we may be behind the curve on innovation for a long time. I'm not sure how much Big Data really comes into play for most libraries. You might need terabytes of cloud storage for a digital preservation project, but considering the bulk of that would be the digitized images/videos/recordings themselves, each with a metadata record, you don't necessarily have a very large or complex a data structure. How many library projects are beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time? I'm honestly not sure, and I wonder about the nebulous definition. What is commonly used? Hadoop? On the other hand preserving Big Data, say from the Large Hadron Collider, and creating discovery tools for future researchers, is something that librarians could potentially be involved in, but if CERN already built the database and discovery tools before it reached the library, did we miss the game? Do Big Data projects say to themselves in the planning stage We need a librarian? Should they? If so are we ready? Then there's the privacy issue: Even before Snowden, the ALA Code of Ethics bumped up against the power of crunching user data for recommendation systems and the like. Even if you adequately anonymize your data, taking it only in aggregate, it goes against the grain of traditional library culture. Any discussion of retaining user social profiles, search history, or activity tracking means talking about patron rights to anonymity. The goal I've been fixated on for library software development has been to deliver staff and patron-friendly open-source cataloging, discovery, and curation tools for libraries that take back control of our systems from closed corporate vendors, provide a user experience that matches or exceeds expectations created in the marketplace, and remain committed to the ethical standards and social contract traditionally held by libraries in our society. When you consider that most of the professional news industry delivers information discovery services using Drupal, Django, or Wordpress, why can't there be robust ecosystems like these for libraries? Hope I didn't bore anyone. Dave Lyons Digital Librarian The Beijing Center for Chinese Studies