Re: [CODE4LIB] Retrieving ISSN using a DOI
You should be able to use the content negotiation support on Crossref to get the metadata, which does include the ISSNs - or at least has the potential to if they are available. E.g. curl -LH "Accept: application/rdf+xml;q=0.5, application/vnd.citationstyles.csl+json;q=1.0" http://dx.doi.org/10.1126/science.169.3946.635 Gives { "subtitle": [], "subject": [ "General" ], "issued": { "date-parts": [ [ 1970, 8, 14 ] ] }, "score": 1.0, "prefix": "http://id.crossref.org/prefix/10.1126";, "author": [ { "family": "Frank", "given": "H. S." } ], "container-title": "Science", "page": "635-641", "deposited": { "date-parts": [ [ 2011, 6, 27 ] ], "timestamp": 130913280 }, "issue": "3946", "title": "The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance", "type": "journal-article", "DOI": "10.1126/science.169.3946.635", "ISSN": [ "0036-8075", "1095-9203" ], "URL": "http://dx.doi.org/10.1126/science.169.3946.635";, "source": "CrossRef", "publisher": "American Association for the Advancement of Science (AAAS)", "indexed": { "date-parts": [ [ 2013, 11, 7 ] ], "timestamp": 1383796678887 }, "volume": "169" } Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 5 Mar 2014, at 12:30, Graham, Stephen wrote: > OK, I've received a couple of emails telling me that the ISSN is not always > included in the DOI - that it depends on the publisher. So, I guess my > original question still stands! > > Stephen > > From: Graham, Stephen > Sent: 05 March 2014 12:25 > To: 'CODE4LIB@LISTSERV.ND.EDU' > Subject: RE: Retrieving ISSN using a DOI > > Sorry - I've answered my own question. The ISSN is actually contained in the > DOI. Didn't realise this! D'oh! > > Stephen > > From: Graham, Stephen > Sent: 05 March 2014 12:14 > To: 'CODE4LIB@LISTSERV.ND.EDU' > Subject: Retrieving ISSN using a DOI > > Hi All - is there a service/API that will return the ISSN if I provide the > DOI? I was hoping that the Crossref API would do this, but I can't see the > ISSN in the JSON it returns. > > I'm adding a DOI field to our OPAC ILL form, so if the user has the DOI they > can use this to populate the form rather than add all the data manually. When > the user submits the form I'm querying our openURL resolver API to see if we > have access to the article. If we do then the form will alert the user and > provide a link. The query to the openURL resolver works better if we have the > ISSN, but if the user has used a DOI the ISSN is frustratingly never there. > > Stephen > > Stephen Graham > Online Information Manager > Information Collections and Services > University of Hertfordshire, Hatfield. AL10 9AB > Tel. 01707 286111 > Email s.grah...@herts.ac.uk<mailto:s.grah...@herts.ac.uk>
Re: [CODE4LIB] tool for finding close matches in vocabular list
As Roy suggests, Open Refine is designed for this type of work and could easily deal with the volume you are talking about here. It can cluster terms using a variety of algorithms and easily apply a set of standard transformations. The screencasts and info at http://freeyourmetadata.org/cleanup/ might be a good starting point if you want to see what Refine can do Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 21 Mar 2014, at 18:24, Ken Irwin wrote: > Hi folks, > > I'm looking for a tool that can look at a list of all of subject terms in a > poorly-controlled index as possible candidates for term consolidation. Our > student newspaper index has about 16,000 subject terms and they include a lot > of meaningless typographical and nomenclatural difference, e.g.: > > Irwin, Ken > Irwin, Kenneth > Irwin, Mr. Kenneth > Irwin, Kenneth R. > > Basketball - Women > Basketball - Women's > Basketball-Women > Basketball-Women's > > I would love to have some sort of pattern-matching tool that's smart about > this sort of thing that could go through the list of terms (as a text list, > database, xml file, or whatever structure it wants to ingest) and spit out > some clusters of possible matches. > > Does anyone know of a tool that's good for that sort of thing? > > The index is just a bunch of MySQL tables - there is no real controlled-vocab > system, though I've recently built some systems to suggest known SH's to > reduce this sort of redundancy. > > Any ideas? > > Thanks! > Ken
Re: [CODE4LIB] semantic web browsers
Your findings reflect my experience - there isn't much out there and what is basic or doesn't work at all. Link Sailor is another http://linksailor.com but I suspect not actively maintained (developed by Ian Davis when he was at Talis doing linked data work) I think the Graphite based browser from Southampton *does* support content-negotiation - what makes you think it doesn't? Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 22 Mar 2014, at 20:49, Eric Lease Morgan wrote: > Do you know of any working Semantic Web browsers? > > Below is a small set of easy-to-use Semantic Web browsers. Give them URIs and > they allow you to follow and describe the links they include. > > * LOD Browser Switch (http://browse.semanticweb.org) - This is >really a gateway to other Semantic Web browsers. Feed it a URI >and it will create lists of URLs pointing to Semantic Web >interfaces, but many of the URLs (Semantic Web interfaces) do not >seem to work. Some of the resulting URLs point to RDF >serialization converters > > * LodLive (http://en.lodlive.it) - This Semantic Web browser >allows you to feed it a URI and interactively follow the links >associated with it. URIs can come from DBedia, Freebase, or one >of your own. > > * Open Link Data Explorer >(http://demo.openlinksw.com/rdfbrowser2/) - The most >sophisticated Semantic Web browser in this set. Given a URI it >creates various views of the resulting triples associated with >including lists of all its properties and objects, networks >graphs, tabular views, and maps (if the data includes geographic >points). > > * Quick and Dirty RDF browser >(http://graphite.ecs.soton.ac.uk/browser/) - Given the URL >pointing to a file of RDF statements, this tool returns all the >triples in the file and verbosely lists each of their predicate >and object values. Quick and easy. This is a good for reading >everything about a particular resource. The tool does not seem >to support content negotiation. > > If you need some URIs to begin with, then try some of these: > > * Ray Family Papers - http://infomotions.com/sandbox/liam/data/mum432.rdf > * Catholics and Jews - > http://infomotions.com/sandbox/liam/data/shumarc681792.rdf > * Walt Disney via VIAF - http://viaf.org/viaf/36927108/ > * origami via the Library of Congress - > http://id.loc.gov/authorities/subjects/sh85095643 > * Paris from DBpedia - http://dbpedia.org/resource/Paris > > To me, this seems like a really small set of browser possibilities. I’ve seen > others but could not get them to work very well. Do you know of others? Am I > missing something significant? > > — > Eric Lease Morgan
[CODE4LIB] Research Libraries UK Hack day
Just over a year and a half ago I posted about some work I was doing on behalf of Research Libraries UK (RLUK) who were looking at the potential of publishing several million of their bibliographic records (drawn from the major research libraries in the UK) as linked open data.In August last year RLUK announced it would join The European Library (TEL)[1], and would work with the team at TEL to publish RLUK data, along with other data held by The European Library, as linked open data. I'm happy to say that they are now very close to making the (approximately) 17 million RLUK records available. To start the process of working with the wider community of librarians, developers, and anyone interested in exploiting this data, RLUK is holding a hack day in London on 14th May. Here the RLUK Linked Open Data will be introduced, along with the TEL API (OpenSearch based). There will be prizes (to be announced) for hacks in the following areas which represent areas of interest to RLUK and TEL: • Linking Up datasets - a prize for work that combines data from multiple data sets • WWI • Eastern Europe • Delivering a valuable hack for RLUK members The event is free and you can sign up now at https://www.eventbrite.co.uk/e/rluk-hack-day-rlukhack-tickets-11197529111 - I hope to see some of you there Best wishes Owen 1. http://www.rluk.ac.uk/news/rluk-joins-european-library/ Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
Re: [CODE4LIB] distributed responsibility for web content
I'd second the suggestions from Erin with regards establishing style guides and Ross's suggestion of peer review. While not quite directly about the issue you have, Paul Boag a UK web designer has spoken and blogged on how clear policies relying on quantitative measures can help establish clear policies and (perhaps!) take some of the emotion out of decision making - e.g. see http://boagworld.com/business-strategy/website-animal/ - perhaps a similar approach might help here as well. Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 18 Apr 2014, at 15:15, Erin White wrote: > Develop a brief content and design style guide, then have it approved by > your leadership team and share it with your organization. (Easier > said than done, I know.) Bonus points if you work with your (typically) > print-focused communications person to develop this guide and get his/her > buy-in on creating content for the web. > > A style guide sets expectations across the board and helps you when you > need to play they heavy. As you need, you can e-mail folks with a link to > the style guide, ask them to revise, and offer assistance or suggestions if > they want. > > Folks are grumpy about this at first, but generally appreciate the overall > strategy to make the website more consistent and professional-looking. It > ain't the wild wild west anymore - our web content is both functional and > part of an overall communications strategy, and we need to treat it > accordingly. > > -- > Erin White > Web Systems Librarian, VCU Libraries > 804-827-3552 | erwh...@vcu.edu | www.library.vcu.edu > > > On Fri, Apr 18, 2014 at 9:39 AM, Pikas, Christina K. < > christina.pi...@jhuapl.edu> wrote: > >> Laughing and feeling your pain... we have a communications person (that's >> her job) who keeps using bold, italics, h1, in pink (yes pink), randomly in >> pages... luckily she only does internal pages, and not external. >> >> You could schedule some writing for the web sessions, but I don't know >> that it will help. You could remove any text formatting... In the end, you >> probably should just do as I do: close the page, breathe deeply, get up and >> take a walk, and get on with other things. >> >> Christina >> >> -Original Message- >> From: Code for Libraries [mailto:CODE4LIB@listserv.nd.edu] On Behalf Of >> Simon LeFranc >> Sent: Thursday, April 17, 2014 7:43 PM >> To: CODE4LIB@listserv.nd.edu >> Subject: [CODE4LIB] distributed responsibility for web content >> >> My organization has recently adopted an enterprise Content Management >> System. For the first time, staff across 8 divisions became web authors, >> given responsibility for their division's web pages. Training on the >> software, which has a WYSIWYG interface for editing, is available and with >> practice, all are capable of mastering the basic tools. Some simple style >> decisions were made for them, however, it is extremely difficult to get >> these folks not to elaborate on or improvise new styles. Examples: >> >>making text red or another color in the belief that color will draw >> readers' attentionmaking text bold and/or italic and/or the size of a >> war-is-declared headline (see 1);using images that are too small to be >> effectiveadding a few more images that are too small to be effective >> attempting to emphasize statements using ! or !! or !writing in a >> too-informal tone ("Come on in outta the rain!") [We are a research >> organization and museum.]feeling compelled to ornament pages with >> clipart, curlicues, et al.centering everything >> There is no one person in the organization with the time or authority to >> act as editorial overseer. What are some techniques for ensuring that the >> site maintains a clean, professional appearance? >> >> Simon >> >> >>
Re: [CODE4LIB] barriers to open metadata?
Hi Laura, I've done some work on this in the UK[1][2] and there have been a number of associated projects looking at the open release of library, archive and museum metadata[3]. For libraries (it is different of archives and museums) I think I'd sum up the reasons in three ways - in order of how commonly I think they apply a. Ignorance/lack of thought - libraries don't tend to licence their metadata, and often make no statement about how it can be used - my experience is that often no-one has even asked the questions about licencing/data release b. No business case - in the UK we talked to a group of university librarians and found that they didn't see a compelling business case for making open data releases of their catalogue records c. Concern about breaking contractual agreements or impinging on 3rd party copyright over records. The Comet project at the University of Cambridge did a lot of work in this area[4] As Roy notes, there have been some significant changes recently with OCLC and many national libraries releasing data under open licences. However, while this changes (c) it doesn't impact so much on (a) and (b) - so these remain as fundamental issues and I have a (unsubstantiated) concern that big data releases lead to libraries taking less interest ("someone else is doing this for us") rather than taking advantage of the clarity and openess these big data releases and associated announcements bring. A final point - looking at libraries behaviour in relation to institutional/open access repositories, where you'd expect at least (a) to be considered, unfortunately when I looked a couple of years ago I found similar issues. Working for the CORE project at the Open University[5] I found that OpenDOAR[6] listed "Metadata re-use policy explicitly undefined" for 57 out of 125 UK repositories with OAI-PMH services. Only 18 repositories were listed as permitting commerical re-use of metadata. Hopefully this has improved in the intervening 2 years! Hope some of this is helpful Owen 1 Jisc Guide to Open Bibliographic Data http://obd.jisc.ac.uk 2 Jisc Discovery principles http://discovery.ac.uk/businesscase/principles/ 3 Jisc Discovery Case studies http://guidance.discovery.ac.uk 4 COMET http://cul-comet.blogspot.co.uk/p/ownership-of-marc-21-records.html 5 CORE blog http://core-project.kmi.open.ac.uk/node/32 6 OpenDOAR http://www.opendoar.org/ Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 29 Apr 2014, at 21:06, Ben Companjen wrote: > Hi Laura, > > Here are some reasons I may have overheard. > > Stuck halfway: "We have an OAI-PMH endpoint, so we're open, right?" > > Lack of funding for sorting out our own rights: "We gathered metadata from > various sources and integrated the result - we even call ourselves Open > L*y - but we [don't have manpower to figure out what we can do with > it, so we added a disclaimer]." > > Cultural: "We're not sure how to prevent losing the records' provenance > after we released our metadata." > > > Groeten van Ben > > On 29-04-14 19:02, "Laura Krier" wrote: > >> Hi Code4Libbers, >> >> I'd like to find out from as many people as are interested what barriers >> you feel exist right now to you releasing your library's bibliographic >> metadata openly. I'm curious about all kinds of barriers: technical, >> political, financial, cultural. Even if it seems obvious, I'd like to hear >> about it. >> >> Thanks in advance for your feedback! You can send it to me privately if >> you'd prefer. >> >> Laura >> >> -- >> Laura Krier >> >> laurapants.com<http://laurapants.com/?utm_source=email_sig&utm_medium=emai >> l&utm_campaign=email>
Re: [CODE4LIB] Any good "introduction to SPARQL" workshops out there?
I contributed to a session like this in the UK aimed at cataloguers/metadata librarians http://www.cilip.org.uk/cataloguing-and-indexing-group/events/linked-data-what-cataloguers-need-know-cig-event. All the slide decks used are available at http://www.cilip.org.uk/cataloguing-and-indexing-group/linked-data-what-cataloguers-need-know Specifically my introduction to SPARQL slides are at http://www.slideshare.net/ostephens/selecting-with-sparql-using-british-national-bibliography-as, and link to various example SPARQL queries that can be run on the BNB SPARQL endpoint (SPARQL examples are all Gists at https://gist.github.com/ostephens) Not sure about the practicalities of bringing this to staff in the US, although planning is in progress for another event in the UK along the same lines and I'd be happy to put you in touch with the relevant people on the committee to see if there is any possibility of having it webcast if there was interest. Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 1 May 2014, at 17:23, Hutt, Arwen wrote: > We're interested in an introduction to SPARQL workshop for a smallish group > of staff. Specifically an introduction for fairly tech comfortable > non-programmers (in our case metadata librarians), as well as a refresher for > programmers who aren't using it regularly. > > Ideally (depending on cost) we'd like to bring the workshop to our staff, > since it'll allow more people to attend, but any recommendations for good > introductory workshops or tutorials would be welcome! > > Thanks! > Arwen > > > Arwen Hutt > Head, Digital Object Metadata Management Unit > Metadata Services, Geisel Library > University of California, San Diego >
Re: [CODE4LIB] Is ISNI / ISO 27729:2012 a name identifier or an entity identifier?
An aside but interesting to see how some of this identity stuff seems to be playing out in the wild now. Google for Catherine Sefton: https://www.google.co.uk/search?q=catherine+sefton The Knowledge Graph displays information about Martin Waddell. Catherine Sefton is a pseudonym of Martin Waddell. It is impossible to know, but the most likely source of this knowledge is Wikipedia which includes the ISNI for Catherine Sefton in the Wikipeda page for Martin Waddell (http://en.wikipedia.org/wiki/Martin_Waddell) (although oddly not the ISNI for Martin Waddell under his own name). Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 18 Jun 2014, at 23:28, Stuart Yeates wrote: > My reading of that suggests that > http://isni-url.oclc.nl/isni/000122816316 shouldn't have both "Bell, > Currer" and "Brontë, Charlotte", which it clearly does... > > Is this is a case of one of our sources of truth doesn't distinguish betweens > identities and entities and we're allowing it to pollute our data? > > If that source of truth is wikipedia, we can fix that. > > cheers > stuart > > On 06/19/2014 12:11 AM, Richard Wallis wrote: >> Hi all, >> >> Seeing this thread I checked with the ISNI team and got the following >> answer from Janifer Gatenby who asked me to post it on her behalf: >> >> SNI identifies “public identities”.The scope as stated in the standard >> is >> >> >> >> “This International Standard specifies the International Standard name >> identif*i*er (ISNI) for the identification of public identities of parties; >> that is, the identities used publicly by parties involved throughout the >> media content industries in the creation, production, management, and >> content distribution chains.” >> >> >> >> The relevant definitions are: >> >> >> >> *3.1* >> >> *party* >> >> natural person or legal person, whether or not incorporated, or a group of >> either >> >> *3.3* >> >> *public identity* >> >> Identity of a *party *(3.1) or a fictional character that is or was >> presented to the public >> >> *3.4* >> >> *name* >> >> character string by which a *public identity *(3.3) is or was commonly >> referenced >> >> >> >> A party may have multiple public identities and a public identity may have >> multiple names (e.g. pseudonyms) >> >> >> >> ISNI data is available as linked data. There are currently 8 million ISNIs >> assigned and 16 million links. >> >> >> >> Example: >> >> >> >> [image: ] >> >> ~Richard. >> >> >> On 16 June 2014 10:54, Ben Companjen wrote: >> >>> Hi Stuart, >>> >>> I don't have a copy of the official standard, but from the documents on >>> the ISNI website I remember that there are name variations and 'public >>> identities' (as the lemma on Wikipedia also uses). I'm not sure where the >>> borderline is or who decides when different names are different identities. >>> >>> If it were up to me: pseudonyms are definitely different public >>> identities, name changes after marriage probably not, name change after >>> gender change could mean a different public identity. Different public >>> identities get different ISNIs; the ISNI organisation says the ISNI system >>> can keep track of connected public identities. >>> >>> Discussions about name variations or aliases are not new, of course. I >>> remember the discussions about 'aliases' vs 'Artist Name Variations' that >>> are/were happening on Discogs.com, e.g. 'is J Dilla an alias or a ANV of >>> Jay Dee?' It appears the users on Discogs finally went with aliases, but >>> VIAF put the names/identities together: http://viaf.org/viaf/32244000 - >>> and there is no ISNI (yet). >>> >>> It gets more confusing when you look at Washington Irving who had several >>> pseudonyms: they are just listed under one ISNI. Maybe because he is dead, >>> or because all other databases already know and connected the pseudonyms >>> to the birth name? (I just sent a comment asking about the record at >>> http://isni.org/isni/000121370797 ) >>> >>> >>> [Here goes the reference list…] >>> >>> Hope this helps :) >>> >>> Groeten van Ben &
[CODE4LIB] 'automation' tools
I'm doing a workshop in the UK at a library tech unconference-style event (Pi and Mash http://piandmash.info) on automating computer based tasks. I want to cover tools that are usable by non-programmers and that would work in a typical library environment. The types of tools I'm thinking of are: MacroExpress AutoHotKey iMacros for Firefox While I'm hoping workshop attendees will bring ideas about tasks they would like to automate the type of thing I have in mind are things like: Filling out a set of standard data on a GUI or Web form (e.g. standard set of budget codes for an order) Processing a list of item barcodes from a spreadsheet and doing something with them on the library system (e.g. change loan status, check for holds) Similarly for User IDs Navigating to a web page and doing some task Clearly some of these tasks would be better automated with appropriate APIs and scripts, but I want to try to introduce those without programming skills to some of the concepts and tools and essentially how they can work around problems themselves to some extent. What tools do you use for this kind of automation task, and what kind of tasks do they best deal with? Thanks, Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
Re: [CODE4LIB] 'automation' tools
Thanks Riley and Andrew for these pointers - some great stuff in there Other tools and examples still very welcome :) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 4 Jul 2014, at 15:04, Andrew Weidner wrote: > Great idea for a workshop, Owen. > > My staff and I use AutoHotkey every day. We have some apps for data > cleaning in the CONTENTdm Project Client that I presented on recently: > http://scholarcommons.sc.edu/cdmusers/cdmusersMay2014/May2014/13/. I'll be > talking about those in more detail at the Upper Midwest Digital Collections > Conference <http://www.wils.org/news-events/wilsevents/umdcc/> if anyone is > interested. > > I did an in-house training session for our ILS and database management > folks on a simple AHK app that they now use for repetitive data entry: > https://github.com/metaweidner/AutoType. When I was working with digital > newspapers I developed a suite of tools for making repetitive quality > review tasks easier: https://github.com/drewhop/AutoHotkey/wiki/NDNP_QR > > Basic AHK scripts are really great for text wrangling. Just yesterday I > wrote a script to grab some values from a spreadsheet, remove commas from > the numbers, and dump them into a tab delimited file in the format that we > need. That script will become part of our regular workflow. Wrote another > one-off script to transform labels on our wiki into links. It wrapped the > labels in the wiki link syntax, and then I copied and pasted the unique > URLs into the appropriate spots. > > It's also useful for keeping things organized. I have a set of scripts that > open up frequently used network drive folders and applications, and I > packaged them as drop down menu choices in a little GUI that's always open > on the desktop. We have a few search scripts that either grab values from a > spreadsheet or input box and then run a search for those terms in a web > database (e.g. id.loc.gov). > > You might check out Selenium IDE for working with web forms: > http://docs.seleniumhq.org/projects/ide/. The recording feature makes it > really easy to get started with as an automation tool. I've used it > extensively for automated metadata editing: > http://digital.library.unt.edu/ark:/67531/metadc86138/m1/1/ > > Cheers! > > Andrew > > > On Fri, Jul 4, 2014 at 6:54 AM, Riley Childs wrote: > >> Don't forget AutoIT (auto IT, pretty clever eh?) >> http://www.autoitscript.com/site/autoit/ >> >> Riley Childs >> Student >> Asst. Head of IT Services >> Charlotte United Christian Academy >> (704) 497-2086 >> RileyChilds.net >> Sent from my Windows Phone, please excuse mistakes >> >> -Original Message- >> From: "Owen Stephens" >> Sent: 7/4/2014 4:55 AM >> To: "CODE4LIB@LISTSERV.ND.EDU" >> Subject: [CODE4LIB] 'automation' tools >> >> I'm doing a workshop in the UK at a library tech unconference-style event >> (Pi and Mash http://piandmash.info) on automating computer based tasks. >> I want to cover tools that are usable by non-programmers and that would >> work in a typical library environment. The types of tools I'm thinking of >> are: >> >> MacroExpress >> AutoHotKey >> iMacros for Firefox >> >> While I'm hoping workshop attendees will bring ideas about tasks they >> would like to automate the type of thing I have in mind are things like: >> >> Filling out a set of standard data on a GUI or Web form (e.g. standard set >> of budget codes for an order) >> Processing a list of item barcodes from a spreadsheet and doing something >> with them on the library system (e.g. change loan status, check for holds) >> Similarly for User IDs >> Navigating to a web page and doing some task >> >> Clearly some of these tasks would be better automated with appropriate >> APIs and scripts, but I want to try to introduce those without programming >> skills to some of the concepts and tools and essentially how they can work >> around problems themselves to some extent. >> >> What tools do you use for this kind of automation task, and what kind of >> tasks do they best deal with? >> >> Thanks, >> >> Owen >> >> Owen Stephens >> Owen Stephens Consulting >> Web: http://www.ostephens.com >> Email: o...@ostephens.com >> Telephone: 0121 288 6936 >>
Re: [CODE4LIB] coders who library? [was: Let me shadow you, librarians who code!]
I'm a librarian, and a slightly poor excuse for a coder second. I've always focussed on the IT/tech side of librarianship in my career and did at one point cross from libraries into more general IT management - then firmly put myself back into libraries. To a certain extent I left library employment to freelance as a consultant to get out of the academic library career path that kept taking me into management - which I realised, after several years doing it, was just not what got me out of bed in the morning. There is a name for people without an MLS who can still quote MARC subfields or write MODS XML freehand. http://shambrarian.org :) Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 7 Jul 2014, at 15:36, Miles Fidelman wrote: > This recent spate of message leads me to wonder: How many folks here who > "code for libraries" have a library science degree/background, vs. folks who > come from other backgrounds? What about folks who end up in technology > management/direction positions for libraries? > > Personally: Computer scientist and systems engineer, did some early > Internet-in-public library deployments, got to write a book about it. Not > actively doing library related work at the moment. > > Miles Fidelman > > > Dot Porter wrote: >> I'm a medieval manuscripts curator who codes, in Philadelphia, and I'd be >> happy to talk to you as well. >> >> Dot >> >> >> On Tue, Jul 1, 2014 at 10:30 AM, David Mayo wrote: >> >>> If you'd like to talk to someone who did a library degree, and currently >>> works as a web developer supporting an academic library, I'd be happy to >>> talk with you. >>> >>> - Dave Mayo >>> Software Engineer @ Harvard > HUIT > LTS >>> >>> >>> On Tue, Jul 1, 2014 at 10:12 AM, Steven Anderson < >>> stevencander...@hotmail.com> wrote: >>> >>>> Jennie, >>>> As with others, I'm not a librarian as I lack a library degree, but I do >>>> Digital Repository Development for the Boston Public Library >>> (specifically: >>>> https://www.digitalcommonwealth.org/). Feel free to let me know you want >>>> to chat for your masters paper. >>>> Sincerely,Steven AndersonWeb Services - Digital Library Repository >>>> developer617-859-2393sander...@bpl.org >>>> >>>>> Date: Tue, 1 Jul 2014 13:51:07 + >>>>> From: mschofi...@nova.edu >>>>> Subject: Re: [CODE4LIB] Let me shadow you, librarians who code! >>>>> To: CODE4LIB@LISTSERV.ND.EDU >>>>> >>>>> Hey Jennie, >>>>> >>>>> I'm waaay south of MA but I'm pretty addicted to talking about coding >>> as >>>> a library job O_o. If you are still in want of guinea-pigs, I'd love to >>>> skype / hangout. >>>>> Michael Schofield >>>>> // mschofi...@nova.edu >>>>> >>>>> -Original Message- >>>>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf >>> Of >>>> Jennie Rose Halperin >>>>> Sent: Monday, June 30, 2014 3:58 PM >>>>> To: CODE4LIB@LISTSERV.ND.EDU >>>>> Subject: [CODE4LIB] Let me shadow you, librarians who code! >>>>> >>>>> hey Code4Lib, >>>>> >>>>> Do you work in a library and also like coding? Do you do coding as >>> part >>>> of your job? >>>>> I'm writing my masters paper for the University of North Carolina at >>>> Chapel Hill and I'd like to shadow and interview up to 10 librarians and >>>> archivists who also work with code in some way in the Boston area for the >>>> next two weeks. >>>>> I'd come by and chat for about 2 hours, and the whole thing will not >>>> take up too much of your time. >>>>> Not in Massachusetts? Want to skype? Let me know and that would be >>>> possible. >>>>> I know that this list has a pretty big North American presence, but I >>>> will be in Berlin beginning July 14, and could potentially shadow anyone >>> in >>>> Germany as well. >>>>> Best, >>>>> >>>>> Jennie Rose Halperin >>>>> jennie.halpe...@gmail.com >>>> >> >> > > > -- > In theory, there is no difference between theory and practice. > In practice, there is. Yogi Berra
Re: [CODE4LIB] 'automation' tools
Thanks again all, I love OpenRefine - I've been working on the GOKb project (http://gokb.org) where K-Int (a UK based company) have developed an extension for OpenRefine which helps with the cleaning of data about electronic resources (esp. journals) from publishers and then pushes it into the GOKb database. The extension is fully integrated into the GOKb database but if anyone wants a look code is at https://github.com/k-int/gokb-phase1/tree/dev/refine. The extension checks the data and reports errors as well as offering ways of fixing common issues - there's more on the wiki https://wiki.kuali.org/display/OLE/OpenRefine+How-Tos I did pitch an OpenRefine workshop for the same event as a 'data wrangling/cleaning' tool but the 'automation' session got the vote in the end - although there is definitely overlap. However I am delivering an OpenRefine workshop at the British Library next week - and great to see it is getting used across libraries. The Google Doc Spreadsheets is also a great tip - I've run a course at the British Library which uses this to introduce the concept of APIs to non-techies. I blogged the original tutorial at http://www.meanboyfriend.com/overdue_ideas/2013/02/introduction-to-apis/ but a change to the BL open data platform means this no longer works :(( Thanks all again - I'll be trying to put stuff from the automation workshop online at some point and I'll post here when there is something up. Best wishes, Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 8 Jul 2014, at 03:52, davesgonechina wrote: > +1 to OpenRefine. Some extensions, like RDF Refine <http://refine.deri.ie/>, > currently only work with the old Google Refine (still available here > <https://code.google.com/p/google-refine/>). There's a good deal of > interesting projects for OpenRefine on GitHub and GitHub Gist. > > Google Docs Spreadsheets also has a surprising amount of functionality, > such as importXML if you're willing to get your hands dirty with regular > expressions. > > Dave > > > On Tue, Jul 8, 2014 at 3:12 AM, Tillman, Ruth K. (GSFC-272.0)[CADENCE GROUP > ASSOC] wrote: > >> Definite cosign on Open Refine. It's intuitive and spreadsheet-like enough >> that a lot of people can understand it. You can do anything from >> standardizing state names you get from a patron form to normalizing >> metadata keywords for a database, so I think it'd be useful even for >> non-techies. >> >> Ruth Kitchin Tillman >> Metadata Librarian, Cadence Group >> NASA Goddard Space Flight Center Library, Code 272 >> Greenbelt, MD 20771 >> Goddard Library Repository: http://gsfcir.gsfc.nasa.gov/ >> 301.286.6246 >> >> >> -Original Message- >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of >> Terry Brady >> Sent: Monday, July 07, 2014 1:35 PM >> To: CODE4LIB@LISTSERV.ND.EDU >> Subject: Re: [CODE4LIB] 'automation' tools >> >> I learned about Open Refine <http://openrefine.org/> at the Code4Lib >> conference, and it looks like it would be a great tool for normalizing >> data. I worked on a few projects in the past in which this would have been >> very helpful. >>
[CODE4LIB] Automation tools - session at the "Pi and Mash" unconference
Dear all, A month or so ago I asked for recommendations for automation tools that people used in libraries to help inform a session I was going to run. The unconference event (Pi and Mash) ran this weekend, and I just wanted to share the materials I wrote for the session in case they are of any help. The materials consist of a slidedeck called "Automated Love Presentation" (available as Keynote, Powerpoint and PDF) and some examples and exercises you can work through in a document called "Automated Love Examples" (available as Pages, Word doc, PDF and ePub). There are also two accompanying files 'ISBNs.xlsx' and 'isbns.csv' which are used in the examples/exercises. All materials are available at http://bit.ly/automatedlovefolder Thanks to all who made suggestions which contributed towards the session. Best wishes, Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
Re: [CODE4LIB] Automated searching of Copac/Worldcat
The worksheets I circulated earlier in the week include examples of how to take a list of ISBNs from a spreadsheet/csv file and search on Worldcat (see the 'Automated Love Examples' docs in http://bit.ly/automatedlovefolder) What these examples don't do is include how to check the outcome of the search automatically are record that. I think it would be relatively easy to add to the iMacros example to extract a hit count / no hits message and write this to a file using the iMacros SAVEAS command but I haven't tried this. For a 'no results' option you'd want to look for the presence/extract the contents of a div with id=div-results-none For a results count you'd want to to look for the contents of a table within the div with class=resultsinfo Alternatively you could look at the Selenium IDE extension for Firefox which is more complex but allows more sophisticated approach to checking and writing out information about text present/absent in web pages retrieved. Hope that is of some help Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 13 Aug 2014, at 11:20, Nicholas Brown wrote: > Apologies for cross posting > > Dear collective wisdom, > > I'm interested in using automation software such as Macro Express or iMacros > to feed a list of ISBNs from a spreadsheet into Copac or Worldcat and output > a list of those that return no matches in the results screen. The idea would > be to create a tool that can quickly, although rather roughly, identify rare > items in a collection (though obviously this would be limited to items with > ISBNs or other unique identifiers). I can write a macro which will > sequentially search either catalogue for a list of ISBNs but am struggling > with how to have the macro identify items with no matches (I have a vague > idea about searching the results screen for the text "Sorry, there are no > search results") and to compile them back into a spreadsheet. > > I'd be keen to hear if anyone has attempted something similar, general > advice, any potential pitfalls in the method outlined above or suggestions > for a better way to achieve the same results. If something useful comes of it > I'd be happy to share the results. > > Many thanks for your help, > Nick > > Nicholas Brown > Library and Information Manager > nbr...@iniva.org > +44 (0)20 7749 1125 > www.iniva.org
Re: [CODE4LIB] IFTTT and barcodes
As noted by Tara, when using IFTTT (or similar tools like Bip.io and WappWolf) you are limited to the channels/services the tool has already integrated. You are also in the position of having to give a third party service access to personal information and the ability to read/write certain services. I was investigating these types of services very briefly for a recent workshop and I came across an open source alternative called Huginn which you can run on your own server and of course can extend to work with whatever services/channels you want. I thought it looked interesting - available from https://github.com/cantino/huginn Overkill for this particular problem but may be of more general interest Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 11 Sep 2014, at 08:21, Sylvain Machefert wrote: > Hello, > maybe that an easier solution, more IFTTT related, would be to develop a > Yahoo pipe, using the ISBN & querying the webpac should be easy for Yahoo > Pipes, you can then search in the page using xpath or thing like that. Should > be easier thant developping a custom script (if you have no development > knowledge, ortherwise it should be scripted easily in PHP, python, whatever). > > I haven't used YPipes in a long time but I think it's worth looking at it. > > Sylvain > > > Le 10/09/2014 21:48, Ian Walls a écrit : >> I don't think IFTTT is the right tool, but the basic idea is sound. >> >> With a spot of custom scripting on some server somewhere, one could take in >> an ISBN, lookup via the III WebPac, assess eligibility conditions, then >> return yes or no. Barcode Scanner on Android has the ability to do custom >> search URLs, so if your yes/no script can accept URL params, then you should >> be all set. >> >> Barring a script, just a lookup of the MARC record may be possible, and if >> it was styled in a mobile-friendly manner, perhaps you could quickly glean >> whether it's okay or not for copy cataloging. >> >> Side question: is there connectivity in the stacks for doing this kind of >> lookup? I know in my library, that's not always the case. >> >> >> -Ian >> >> -Original Message- >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of >> Riley Childs >> Sent: Wednesday, September 10, 2014 3:31 PM >> To: CODE4LIB@LISTSERV.ND.EDU >> Subject: Re: [CODE4LIB] IFTTT and barcodes >> >> Webhooks via the WordPress channel? >> >> Riley Childs >> Senior >> Charlotte United Christian Academy >> Library Services Administrator >> IT Services >> (704) 497-2086 >> rileychilds.net >> @rowdychildren >> >> From: Tara Robertson<mailto:trobert...@langara.bc.ca> >> Sent: 9/10/2014 3:03 PM >> To: CODE4LIB@LISTSERV.ND.EDU<mailto:CODE4LIB@LISTSERV.ND.EDU> >> Subject: Re: [CODE4LIB] IFTTT and barcodes >> >> Hi, >> >> I don't think this is possible using IFTTT right now as existing channels >> don't exist to create a recipe. I'm trying to think of what those channels >> would be and can't quite...I don't think IFTTT is the best tool for this >> task. >> >> What ILS are you using? Could you hook a barcode scanner up to a tablet and >> scan, then check the MARC...nah, that's seeming almost as time consuming as >> taking it to your desk (depending on how far your desk is). >> I recall at an Evergreen hackfest that someone was tweaking the web >> interface for an inventory type exercise, where it would show red or green >> depending on some condition. >> >> Cheers, >> Tara >> >> On 10/09/2014 11:52 AM, Harper, Cynthia wrote: >>> Now that someone has mentioned IFTTT, I'm reading up on it and wonder if >> it could make this task possible: >>> One of my tasks is copy cataloging. I'm only authorized to do LC copy, >> which involves opening the record (already downloaded in the acq process), >> and checking to see that 490 doesn't exist (I can't handle series), and >> looking for DLC in the 040 |a and |c. >>> It's discouraging when I take 10 books back to my desk from the cataloging >> shelf, and all 10 are not eligible for copy cataloging. >>> S... could I take my phone to the cataloging shelf, use IFTTT to scan >> my ISBN, search in the III Webpac, look at the MARc record and tell me >> whether it's LC copy? >>> Empower the frontline workers! :) >>> >>> Cindy Harper >>> Electronic Services and Serials Librarian Virginia Theological >>> Seminary >>> 3737 Seminary Road >>> Alexandria VA 22304 >>> 703-461-1794 >>> char...@vts.edu >> >> -- >> >> Tara Robertson >> >> Accessibility Librarian, CAPER-BC <http://caperbc.ca/> T 604.323.5254 F >> 604.323.5954 trobert...@langara.bc.ca >> <mailto:tara%20robertson%20%3ctrobert...@langara.bc.ca%3E> >> >> Langara. <http://www.langara.bc.ca> >> >> 100 West 49th Avenue, Vancouver, BC, V5Y 2Z6
Re: [CODE4LIB] Python or Perl script for reading RDF/XML, Turtle, or N-triples Files
I've not tried using the LCNAF RDF files, and I've not used RDFLib, but a couple of things from (a relatively small amount of) experience parsing RDF: Don't try to parse the RDF/XML, use n-triples instead As Kyle mentioned, you might want to use command line tools to strip down the n-triples to only deal with data you actually want Rapper and the Redland RDF libraries are a good place to start, and have bindings to Perl, PHP, Python and Ruby (http://librdf.org/raptor/rapper.html and http://librdf.org). This StackOverflow Q&A might help getting started http://stackoverflow.com/questions/5678623/how-to-parse-big-datasets-using-rdflib If you want to move between RDF formats an alternative to Rapper is http://www.l3s.de/~minack/rdf2rdf/ - this succeeded converting a file of 48 million triples in ttl to ntriples where Rapper failed with an 'out of memory' error (once in ntriples, Rapper can be used for further parsing) Some slightly random advice there, but maybe some of it will be useful! Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 30 Sep 2014, at 15:54, Jeremy Nelson wrote: > Hi Jean, > I've found rdflib (https://github.com/RDFLib/rdflib) on the Python side > exceeding simple to work with and use. For example, to load the current > BIBFRAME vocabulary as an RDF graph using a Python shell: > >>> import rdflib >>> bf_vocab = rdflib.Graph().parse('http://bibframe.org/vocab/') >>> len(bf_vocab) # Total number of triples > 1683 >>> set([s for s in bf_vocab]) # A set of all unique subjects in the graph > > > This module offers RDF/XML, Turtle, or N-triples support and with various > options for retrieving and manipulating the graph's subjects, predicate, and > objects. I would advise installing the JSON-LD > (https://github.com/RDFLib/rdflib-jsonld) extension as well. > > Jeremy Nelson > Metadata and Systems Librarian > Colorado College > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jean > Roth > Sent: Tuesday, September 30, 2014 8:14 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: [CODE4LIB] Python or Perl script for reading RDF/XML, Turtle, or > N-triples Files > > Thank you so much for the reply. > > I have not investigated the LCNAF data set thoroughly. However, my > default/ideal is to read in all variables from a dataset. > > So, I was wondering if any one had an example Python or Perl script for > reading RDF/XML, Turtle, or N-triples file. A simple/partial example would > be fine. > > Thanks, > > Jean > > On Mon, 29 Sep 2014, Kyle Banerjee wrote: > > KB> The best way to handle them depends on what you want to do. You need > KB> to actually download the NAF files rather than countries or other > KB> small files as different kinds of data will be organized > KB> differently. Just don't try to read multigigabyte files in a text > KB> editor :) > KB> > KB> If you start with one of the giant XML files, the first thing you'll > KB> probably want to do is extract just the elements that are > KB> interesting to you. A short string parsing or SAX routine in your > KB> language of choice should let you get the information in a format you > like. > KB> > KB> If you download the linked data files and you're interested in > KB> actual headings (as opposed to traversing relationships), grep and > KB> sed in combination with the join utility are handy for extracting > KB> the elements you want and flattening the relationships into > KB> something more convenient to work with. But there are plenty of other > tools that you could also use. > KB> > KB> If you don't already have a convenient environment to work on, I'm a > KB> fan of virtualbox. You can drag and drop things into and out of your > KB> regular desktop or even access it directly. That way you can > KB> view/manipulate files with the linux utilities without having to > KB> deal with a bunch of clunky file transfer operations involving > KB> another machine. Very handy for when you have to deal with multigigabyte > files. > KB> > KB> kyle > KB> > KB> On Mon, Sep 29, 2014 at 11:19 AM, Jean Roth wrote: > KB> > KB> > Thank you! It looks like the files are available as RDF/XML, > KB> > Turtle, or N-triples files. > KB> > > KB> > Any examples or suggestions for reading any of these formats? > KB> > > KB> > The MARC Countries file is small, 31-79 kb. I assume a script > KB> > that would read a small file like that would at least be a start > KB> > for the LCNAF > KB> > > KB> > > KB>
Re: [CODE4LIB] ISSN lists?
It may depend on exactly what you need. The ISSN Centre offer licensed access to their ISSN portal at a cost http://www.issn.org - my experience is that this is pretty comprehensive The ISSN Centre also offer a download of ISSN-L tables - this is available for free (although you have to state what you intend to do with it before you can download) - this is just ISSNs (mapped to their ISSN-Ls) but if you don't need bibliographic details then it would be a good source As well as WorldCat you could also try Suncat which offers a z39.50 connection http://www.suncat.ac.uk/support/z-target.shtml, but obviously this has the same issue as the WorldCat approach GOKb and KB+ are both initiatives trying to build knowledgebases containing many ISSNs with data to be made available under a CC0 declaration. Both of these are focussed on describing bundles/packages of journals. GOKb is going to be going into preview imminently (http://gokb.org/news) and KB+ already offers downloads http://www.kbplus.ac.uk/kbplus/publicExport. KB+ currently has details of around 25k journals. There may also be some largescale open data initiatives that give you a reasonably good set of ISSNs. For example the RLUK release of 60m+ records at http://www.theeuropeanlibrary.org/tel4/access/data/lod, or the 12million records released by Harvard http://openmetadata.lib.harvard.edu/bibdata (both CC0) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 17 Oct 2014, at 03:16, Stuart Yeates wrote: > My understanding is that there is no universal ISSN list but that worldcat > allows querying of their database by ISSN. > > Which method of sampling the ISSN namespace is going to cause least pain? > http://www.worldcat.org/ISSN/ seems to be the one talked about, but is there > another that's less resource intensive? Maybe someone's already exported this > data? > > cheers > stuart > -- > I have a new phone number: 04 463 5692
Re: [CODE4LIB] Linux distro for librarians
This triggered a memory of a project that was putting together a ready-to-go toolset for Digital Humanities - which I then couldn't remember the details of - but luckily Twitter was able to remember it for me (thanks to @mackymoo https://twitter.com/mackymoo) The project is DH Box http://dhbox.org which tries to put together an environment suitable for DH work. I think that originally this was to be done via installation on the user's local machine, but due the challenges of dealing the variation in local environment they've now moved to a 'box in the cloud' approach (the change of direction is noted at http://dhbox.commons.gc.cuny.edu/blog/2014/dh-box-new-friend-new-platform#sthash.27THWR6E.dpbs). To be honest I'm not 100% sure where the project is right now, as although it looks like not much has been updated since May 2014. Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 21 Oct 2014, at 15:42, Brad Coffield wrote: > > Is what you're really after is an environment pre-loaded with useful tools > for various types of librarians? If so, maybe instead of rolling your own > distro (and all the work and headache that involves, like a second > full-time job) maybe create software bundles for linux? Have a website > where you have lists of software by librarian type. Then make it easy for > linux users to install them (repo's and what not) ((I haven't been active > in linux for a while)) > > Just thinking out loud. > > > -- > Brad Coffield, MLIS > Assistant Information and Web Services Librarian > Saint Francis University > 814-472-3315 > bcoffi...@francis.edu
Re: [CODE4LIB] MARC reporting engine
The MARC XML seemed to be an archive within an archive - I had to gunzip to get innzmetadata.xml then rename to innzmetadata.xml.gz and gunzip again to get the actual xml Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 3 Nov 2014, at 22:38, Robert Haschart wrote: > > I was going to echo Eric Hatcher's recommendation of Solr and SolrMarc, since > I'm the creator of SolrMarc. > It does provide many of the same tools as are described in the toolset you > linked to, but it is designed to write to Solr rather than to a SQL style > database. Solr may or may not be more suitable for your needs then a SQL > database. However I decided to download the data to see whether SolrMarc > could handle it. I started with the MARCXML.gz data, ungzipped it to get a > .XML file, but the resulting file causes SolrMarc to blow chunks. Either > I'm missing something or there is something way wrong with that data.The > gzipped binary MARC file work fine with the SolrMarc tools. > > Creating a SolrMarc script to extract the 700 fields, plus a bash script to > cluster and count them, and sort by frequency took about 20 minutes. > > -Bob Haschart > > > On 11/3/2014 3:00 PM, Stuart Yeates wrote: >> Thank you to all who responded with software suggestions. >> https://github.com/ubleipzig/marctools is looking like the most promising >> candidate so far. The more I read through the recommendations the more it >> dawned on me that I don't want to have to configure yet another java >> toolchain (yes I know, that may be personal bias). >> >> Thank you to all who responded about the challenges of authority control in >> such collections. I'm aware of these issues. The current project is about >> marshalling resources for editors to make informed decisions about rather >> than automating the creation of articles, because there is human judgement >> involved in the last step I can afford to take a few authority control >> 'risks' >> >> cheers >> stuart >> >> -- >> I have a new phone number: 04 463 5692 >> >> >> From: Code for Libraries on behalf of raffaele >> messuti >> Sent: Monday, 3 November 2014 11:39 p.m. >> To: CODE4LIB@LISTSERV.ND.EDU >> Subject: Re: [CODE4LIB] MARC reporting engine >> >> Stuart Yeates wrote: >>> Do any of these have built-in indexing? 800k records isn't going to fit in >>> memory and if building my own MARC indexer is 'relatively straightforward' >>> then you're a better coder than I am. >> you could try marcdb[1] from marctools[2] >> >> [1] https://github.com/ubleipzig/marctools#marcdb >> [2] https://github.com/ubleipzig/marctools >> >> >> -- >> raffaele
Re: [CODE4LIB] Stack Overflow
Another option would be a 'code4lib Q&A' site. Becky Yoose set up one for Coding/Cataloguing and so can comment on how much effort its been. In terms of asking/answering questions the use is clearly low but I think the content that is there is (generally) good quality and useful. I guess the hard part of any project like this is going to be building the community around it. The first things that occur to me is how you encourage people to ask the question on this new site, rather than via existing methods and how do you build enough community activity around housekeeping such as noting duplicate questions and merging/closing. The latter might be a nice problem to have, but the former is where both the Library / LIS SE and the Digital Preservation SE fell down, and libcatcode suffers the same problem - just not enough activity to be a go-to destination. I'm supportive of the idea, but I'd hate to see this go through the pain of the SE process only to fail for the same reasons as previous efforts in this area. I think we need to think about this underlying problem - but I'm not sure what the solution is/solutions are. Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 4 Nov 2014, at 15:34, Schulkins, Joe > wrote: > > To be honest I absolutely hate the whole reputation and badge system for > exactly the reasons you outline, but I can't deny that I do find the family > of Stack Exchange sites extremely useful and by comparison Listservs just > seem very archaic to me as it's all too easy for a question (and/or its > answer) to drop through the cracks of a popular discussion. Are Listservs > really the best way to deal with help? I would even prefer a Drupal site... > > > Joseph Schulkins| Systems Librarian| University of Liverpool Library| PO Box > 123 | Liverpool L69 3DA | joseph.schulk...@liverpool.ac.uk| T 0151 794 3844 > > Follow us: @LivUniLibrary Like us: LivUniLibrary Visit us: > http://www.liv.ac.uk/library > Special Collections & Archives blog: http://manuscriptsandmore.liv.ac.uk > > > > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Joshua Welker > Sent: 04 November 2014 14:43 > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Stack Overflow > > The concept of a library technology Stack Exchange site as a google-able > repository of information sounds great. However, I do have quite a few > reservations. > > 1. Stack Exchange sites seem to naturally lead to gatekeeping, snobbishness, > and other troll behaviors. The reputation system built into those sites > really go to a lot of folks' heads. High-ranking users seem to take pleasure > in shutting down questions as off-topic, redundant, etc. > Argument and one-upmanship are actively promoted--"The previous answer sucks. > Here's my better answer! " This tends to attract certain (often > male) personalities and to repel certain (often female) personalities. > This seems very contrary to the direction the Code4Lib community has tried to > move in the last few years of being more inclusive and inviting to women > instead of just promoting the stereotypical "IT guy" qualities that dominate > most IT-related discussions on the Internet. More here: > > http://www.banane.com/2012/06/20/there-are-no-women-on-stackoverflow-or-ar > e-there/ > http://michael.richter.name/blogs/why-i-no-longer-contribute-to-stackoverf > low > > 2. Having a Stack Exchange site might fragment the already quite small and > nascent library technology community. This might be an unfounded worry, but > it's worth consideration. A lot of Q&A takes place on this listserv, and it > would be awkward to try to have all this information in both places. That > said, searching StackExchange is much easier than searching a listserv. > > 3. I echo your concerns about vendors. Libraries have a culture of protecting > vendors from criticism. Sure, we do lots of criticism behind closed doors, > but nowhere that leaves an online footprint. Often, our contracts include a > clause that we have to keep certain kinds of information private. I don't > think this is a very positive aspect of librarian culture, but it is there. > > I think a year or two ago that there was a pretty long discussion on this > listserv about creating a Stack Exchange site. > > Josh Welker > > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Schulkins, Joe > Sent: Tuesday, November 04, 2014 8:12 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: [CODE4LIB] Stack Ove
Re: [CODE4LIB] Stack Overflow
Thanks for that Mark. That's running on 'question2answer' which looks to have a reasonable amount of development going on around it https://github.com/q2a/question2answer/graphs/contributors (given Becky's comments about OSQA which still hold true) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 4 Nov 2014, at 16:05, Mark A. Matienzo wrote: > > On Tue, Nov 4, 2014 at 11:00 AM, Owen Stephens wrote: > >> Another option would be a 'code4lib Q&A' site. Becky Yoose set up one for >> Coding/Cataloguing and so can comment on how much effort its been. In terms >> of asking/answering questions the use is clearly low but I think the >> content that is there is (generally) good quality and useful. >> >> I guess the hard part of any project like this is going to be building the >> community around it. The first things that occur to me is how you encourage >> people to ask the question on this new site, rather than via existing >> methods and how do you build enough community activity around housekeeping >> such as noting duplicate questions and merging/closing. The latter might be >> a nice problem to have, but the former is where both the Library / LIS SE >> and the Digital Preservation SE fell down, and libcatcode suffers the same >> problem - just not enough activity to be a go-to destination. > > > I would add that the Digital Preservation SE has been reinstantiated as > Digital Preservation Q&A <http://qanda.digipres.org/>, which is organized > and supported by the Open Planets Foundation and the National Digital > Stewardship Alliance. > > Mark A. Matienzo > Director of Technology, Digital Public Library of America
[CODE4LIB] Automatically updating documentation with screenshots
I work on a web application and when we release a new version there are often updates to make to existing user documentation - especially screenshots where unrelated changes (e.g. the addition of a new top level menu item) can make whole sets of screenshots desirable across all the documentation. I'm looking at whether we could automate the generation of screenshots somehow which has taken me into documentation tools such as Sphinx [http://sphinx-doc.org] and Dexy [http://dexy.it]. However, ideally I want something simple enough for the application support staff to be able to use. Anyone done/tried anything like this? Cheers Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
Re: [CODE4LIB] Automatically updating documentation with screenshots
Thanks all - I'm looking at both Selenium and Casperjs now. I also came across a plugin for 'Robot Framework' [http://robotframework.org] which allows you to grab screenshots (via Selenium) and annotate with notes - along the lines that Ross suggested. The plugin is 'Selenium2Screenshots' [https://github.com/datakurre/robotframework-selenium2screenshots] Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 26 Jan 2015, at 13:16, Mads Villadsen wrote: > > I have used casperjs for this purpose. A small script that loads urls at > multiple different resolutions/user agents and takes a screenshot of each of > them. > > Regards > > -- > Mads Villadsen > Statsbiblioteket > It-udvikler
Re: [CODE4LIB] Automatically updating documentation with screenshots
... and further to this I've just found a neat Chrome plugin which will record a set of actions/tests as CasperJS script, including screenshots - my first impressions are pretty positive - code produced looks pretty clean. The plugin is called 'Ressurectio' [https://github.com/ebrehault/resurrectio <https://github.com/ebrehault/resurrectio>] Cheers Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 26 Jan 2015, at 13:48, Owen Stephens wrote: > > Thanks all - I'm looking at both Selenium and Casperjs now. > > I also came across a plugin for 'Robot Framework' [http://robotframework.org > <http://robotframework.org/>] which allows you to grab screenshots (via > Selenium) and annotate with notes - along the lines that Ross suggested. The > plugin is 'Selenium2Screenshots' > [https://github.com/datakurre/robotframework-selenium2screenshots > <https://github.com/datakurre/robotframework-selenium2screenshots>] > > Owen > > Owen Stephens > Owen Stephens Consulting > Web: http://www.ostephens.com <http://www.ostephens.com/> > Email: o...@ostephens.com <mailto:o...@ostephens.com> > Telephone: 0121 288 6936 > >> On 26 Jan 2015, at 13:16, Mads Villadsen > <mailto:m...@statsbiblioteket.dk>> wrote: >> >> I have used casperjs for this purpose. A small script that loads urls at >> multiple different resolutions/user agents and takes a screenshot of each of >> them. >> >> Regards >> >> -- >> Mads Villadsen mailto:m...@statsbiblioteket.dk>> >> Statsbiblioteket >> It-udvikler >
Re: [CODE4LIB] Code4LibCon video crew thanks
Apologies for a +1 message, but you know... +1 and some Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 13 Feb 2015, at 18:00, Cary Gordon wrote: > > I want to deeply thank Ashley Blewer, Steven Anderson and Josh Wilson for > running the video streaming and capture at Code4LibCon in Portland. Because > of you, we had great video in real time (and I got to actually watch the > presentations). I also want to again thank Riley Childs, who could not make > it this year. Riley moved the bar up last year by putting together our > YouTube presence. > > For the second year running, we requested and were not allowed to setup and > test the day before, and for the second year running lost part of the opening > session. Fortunately, we did capture most of what did not get streamed on > Tuesday, and I will put that online next week. There is always next year. > > Thanks, > > Cary
Re: [CODE4LIB] linked data question
I highly recommend Chapter 6 of the Linked Data book which details different design approaches for Linked Data applications - sections 6.3 (http://linkeddatabook.com/editions/1.0/#htoc84) summarises the approaches as: 1. Crawling Pattern 2. On-the-fly dereferencing pattern 3. Query federation pattern Generally my view would be that (1) and (2) are viable approaches for different applications, but that (3) is generally a bad idea (having been through federated search before!) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 26 Feb 2015, at 14:40, Eric Lease Morgan wrote: > > On Feb 25, 2015, at 2:48 PM, Esmé Cowles wrote: > >>> In the non-techie library world, linked data is being talked about (perhaps >>> only in listserv traffic) as if the data (bibliographic data, for instance) >>> will reside on remote sites (as a SPARQL endpoint??? We don't know the >>> technical implications of that), and be displayed by >> catalog/the centralized inter-national catalog> by calling data from that >>> remote site. But the original question was how the data on those remote >>> sites would be - how can I start my search by searching for >>> that remote content? I assume there has to be a database implementation >>> that visits that data and pre-indexes it for it to be searchable, and >>> therefore the index has to be local (or global a la Google or OCLC or its >>> bibliographic-linked-data equivalent). >> >> I think there are several options for how this works, and different >> applications may take different approaches. The most basic approach would >> be to just include the URIs in your local system and retrieve them any time >> you wanted to work with them. But the performance of that would be >> terrible, and your application would stop working if it couldn't retrieve >> the URIs. >> >> So there are lots of different approaches (which could be combined): >> >> - Retrieve the URIs the first time, and then cache them locally. >> - Download an entire data dump of the remote vocabulary and host it locally. >> - Add text fields in parallel to the URIs, so you at least have a label for >> it. >> - Index the data in Solr, Elasticsearch, etc. and use that most of the time, >> esp. for read-only operations. > > > Yes, exactly. I believe Esmé has articulated the possible solutions well. > escowles++ —ELM
Re: [CODE4LIB] eebo
Hi Eric, I’ve worked with EEBO as part of the Jisc Historical Texts (https://historicaltexts.jisc.ac.uk/home) platform - which provides access to EEBO and other collections for UK Universities. My work was around the metadata and search of metadata and full text and display of results. I was mainly looking at metadata but did some digging into the TEI files to see how the markup could be used to extract metadata (e.g. presence of illustrations in the text). I was lucky (?!) enough to have access to the MARC records, but I did also do some work looking at the metadata included in the TEI files. If there is anything I can help with I’d be happy to. The people who worked with the files in detail were a UK s/w development company Knowledge Integration (http://www.k-int.com/) - I can give you a contact there if that would be helpful. Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 5 Jun 2015, at 13:10, Eric Lease Morgan wrote: > > Does anybody here have experience reading the SGML/XML files representing the > content of EEBO? > > I’ve gotten my hands on approximately 24 GB of SGML/XML files representing > the content of EEBO (Early English Books Online). This data does not include > page images. Instead it includes metadata of various ilks as well as the > transcribed full text. I desire to reverse engineer the SGML/XML in order to: > 1) provide an alternative search/browse interface to the collection, and 2) > support various types of text mining services. > > While I am making progress against the data, it would be nice to learn of > other people’s experience so I do not not re-invent the wheel (too many > times). ‘Got ideas? > > — > Eric Lease Morgan > University Of Notre Dame
[CODE4LIB] Global Open Knowledgebase APIs
Dear all, GOKb, the Global Open Knowledgebase, is a community-managed project that aims to describe electronic journals and books, publisher packages, and platforms in a way that will be familiar to librarians who have worked with electronic resources. I’ve been working on the project since it started working with others to gather requirements, develop the underlying data models and specify functionality for the system. GOKb opened to ‘public preview’ in January 2015, and you can signup for an account and access the service at https://gokb.kuali.org/gokb/ <https://gokb.kuali.org/gokb/> Several hundred ejournal packages, and associated information about the ejournal titles, platforms and organisations have been added to the knowledgebase over the past few months. Alongside this work of adding content we have also opened up APIs to interact with the service. We are interested in: * Understanding how people would like to use data from GOKb via APIs (or other mechanisms) * Getting some use of the initial APIs and getting feedback on these * Getting feedback on other APIs people would like to see The current APIs we support are: The ‘Coreference’ service The main aim of this API is to provide back a list of identifiers associated with a title. The service allows you to provide a journal identifier (such as an ISSN) and get back basic information about the journal including title and other identifiers associated with the journal (other ISSNs, DOIs, publisher identifiers etc.). Documentation: https://github.com/k-int/gokb-phase1/wiki/Co-referencing-Detail <https://github.com/k-int/gokb-phase1/wiki/Co-referencing-Detail> Access: https://gokb.kuali.org/gokb/coreference/index <https://gokb.kuali.org/gokb/coreference/index> OAI Interfaces The main aim of this API is to enable other services to obtain data from GOKb on an ongoing basis. Information about ejournal packages, titles and organisations can be obtained via this service Documentation: https://github.com/k-int/gokb-phase1/wiki/OAI-Interfaces-for-Synchronization <https://github.com/k-int/gokb-phase1/wiki/OAI-Interfaces-for-Synchronization> Access: http://gokb.kuali.org/gokb/oai <http://gokb.kuali.org/gokb/oai> Add/Update API This API supports adding and updating data in GOKb. You can add new, or update existing, Organisations and Platforms. You can add additional identifiers to Journal titles. Documentation: https://github.com/k-int/gokb-phase1/wiki/Integration---Telling-GOKb-about-new-or-corresponding-resources-and-local-identifiers <https://github.com/k-int/gokb-phase1/wiki/Integration---Telling-GOKb-about-new-or-corresponding-resources-and-local-identifiers> We also have a SPARQL endpoint available on our test service (which contains test data only). The SPARQL endpoint is at http://test-gokb.kuali.org/sparql <http://test-gokb.kuali.org/sparql>, and a set of example queries are given at https://github.com/k-int/gokb-phase1/wiki/Sample-SPARQL <https://github.com/k-int/gokb-phase1/wiki/Sample-SPARQL> Feedback on any/all of this would be very welcome - either to the list for discussion, or directly to me. We want to make sure we can provide useful data and services and hope you can help us do this. Best wishes, Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
Re: [CODE4LIB] eebo [developments]
Great stuff Eric. I’ve just seen another interesting take based (mainly) on data in the TCP-EEBO release https://scalablereading.northwestern.edu/2015/06/07/shakespeare-his-contemporaries-shc-released/ It includes mention of MorphAdorner[1] which does some clever stuff around tagging parts of speech, spelling variations, lemmata etc. and another tool which I hadn’t come across before AnnoLex[2] "for the correction and annotation of lexical data in Early Modern texts”. This paper[3] from Alistair Baron and Andrew Hardie at the University of Lancaster in the UK about preparing EEBO-TCP texts for corpus-based analysis may also be of interest, and the team at Lancaster have developed a tool called VARD which supports pre-processing texts[4] Owen [1] http://morphadorner.northwestern.edu [2] http://annolex.at.northwestern.edu [3] http://eprints.lancs.ac.uk/60272/1/Baron_Hardie.pdf [4] http://ucrel.lancs.ac.uk/vard/about/ Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 7 Jun 2015, at 18:48, Eric Lease Morgan wrote: > > Here some of developments with my playing with the EEBO data. > > I used the repository on Box to get my content, and I mirrored it locally. > [1, 2] I then looped through the content using XPath to extract rudimentary > metadata, thus creating a “catalog” (index). Along the way I calculated the > number of words in each document and saved that as a field of each "record". > Being a tab-delimited file, it is trivial to import the catalog into my > favorite spreadsheet, database, editor, or statistics program. This allowed > me to browse the collection. I then used grep to search my catalog, and save > the results to a file. [5] I searched for Richard Baxter. [6, 7, 8]. I then > used an R script to graph the numeric data of my search results. Currently, > there are only two types: 1) dates, and 2) number of words. [9, 10, 11, 12] > From these graphs I can tell that Baxter wrote a lot of relatively short > things, and I can easily see when he published many of his works. (He > published a lot around 1680 but little in 1665.) I then transformed the > search resu! lt! > s into a browsable HTML table. [13] The table has hidden features. (Can you > say, “Usability?”) For example, you can click on table headers to sort. This > is cool because I want sort things by number of words. (Number of pages > doesn’t really tell me anything about length.) There is also a hidden link to > the left of each record. Upon clicking on the blank space you can see > subjects, publisher, language, and a link to the raw XML. > > For a good time, I then repeated the process for things Shakespeare and > things astronomy. [14, 15] Baxter took me about twelve hours worth of work, > not counting the caching of the data. Combined, Shakespeare and astronomy > took me less than five minutes. I then got tired. > > My next steps are multi-faceted and presented in the following incomplete > unordered list: > > * create browsable lists - the TEI metadata is clean and >consistent. The authors and subjects lend themselves very well to >the creation of browsable lists. > > * CGI interface - The ability to search via Web interface is >imperative, and indexing is a prerequisite. > > * transform into HTML - TEI/XML is cool, but… > > * create sets - The collection as a whole is very interesting, >but many scholars will want sub-sets of the collection. I will do >this sort of work, akin to my work with the HathiTrust. [16] > > * do text analysis - This is really the whole point. Given the >full text combined with the inherent functionality of a computer, >additional analysis and interpretation can be done against the >corpus or its subsets. This analysis can be based the counting of >words, the association of themes, parts-of-speech, etc. For >example, I plan to give each item in the collection a colors, >“big” names, and “great” ideas coefficient. These are scores >denoting the use of researcher-defined “themes”. [17, 18, 19] You >can see how these themes play out against the complete writings >of “Dead White Men With Three Names”. [20, 21, 22] > > Fun with TEI/XML, text mining, and the definition of librarianship. > > > [1] Box - http://bit.ly/1QcvxLP > [2] mirror - http://dh.crc.nd.edu/sandbox/eebo-tcp/xml/ > [3] xpath script - http://dh.crc.nd.edu/sandbox/eebo-tcp/bin/xml2tab.pl > [4] catalog (index) - http://dh.crc.nd.edu/sandbox/eebo-tcp/catalog.txt > [5] search results - http://dh.crc.nd.edu/sandbox/eebo-tcp/baxter/baxter.txt > [6] Baxter at VIAF - http://viaf.org/viaf/54178741 > [7] Baxter at WorldCat - http://www.worldcat.org/wcidentit
Re: [CODE4LIB] eebo [perfect texts]
And some of the researchers definitely care about this (authority control, high quality descriptive metadata). I went to a hack day focussing on the EEBO-TCP Phase 1 release (these texts). I mentioned to one of the researchers (not a librarian) that I had access to some MARC records which described the works. Their immediate response was “Ah - but which MARC records, because they aren’t all of the same quality”! There are good cataloguing records for the works but they have not been made available under an open licence alongside the transcribed texts. Probably the highest quality records are those in the English Short Title Catalogue (ESTC) http://estc.bl.uk. There have been some great steps forward in the last few years, but I still feel libraries need to increase the amount they are doing to publish metadata under explicitly open licences. Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 8 Jun 2015, at 23:23, Stuart A. Yeates wrote: > > Another thing that could usefully be done is significantly better authority > control. Authors, works, geographical places, subjects, etc, etc. > > Good core librarianship stuff that is essentially orthogonal to all the > other work that appears to be happening. > > cheers > stuart > > -- > ...let us be heard from red core to black sky > > On Tue, Jun 9, 2015 at 12:42 AM, Eric Lease Morgan wrote: > >> On Jun 8, 2015, at 7:32 AM, Owen Stephens wrote: >> >>> I’ve just seen another interesting take based (mainly) on data in the >> TCP-EEBO release: >>> >>> >> https://scalablereading.northwestern.edu/2015/06/07/shakespeare-his-contemporaries-shc-released/ >>> >>> It includes mention of MorphAdorner[1] which does some clever stuff >> around tagging parts of speech, spelling variations, lemmata etc. and >> another tool which I hadn’t come across before AnnoLex[2] "for the >> correction and annotation of lexical data in Early Modern texts”. >>> >>> This paper[3] from Alistair Baron and Andrew Hardie at the University of >> Lancaster in the UK about preparing EEBO-TCP texts for corpus-based >> analysis may also be of interest, and the team at Lancaster have developed >> a tool called VARD which supports pre-processing texts[4] >>> >>> [1] http://morphadorner.northwestern.edu >>> [2] http://annolex.at.northwestern.edu >>> [3] http://eprints.lancs.ac.uk/60272/1/Baron_Hardie.pdf >>> [4] http://ucrel.lancs.ac.uk/vard/about/ >> >> >> All of this is really very interesting. Really. At the same time, there >> seems to be a WHOLE lot of effort spent on cleaning and normalizing data, >> and very little done to actually analyze it beyond “close reading”. The >> final goal of all these interfaces seem to be refined search. Frankly, I >> don’t need search. And the only community who will want this level of >> search will be the scholarly scholar. “What about the undergraduate >> student? What about the just more than casual reader? What about the >> engineer?” Most people don’t know how or why parts-of-speech are important >> let alone what a lemma is. Nor do they care. I can find plenty of things. I >> need (want) analysis. Let’s assume the data is clean — or rather, accept >> the fact that there is dirty data akin to the dirty data created through >> OCR and there is nothing a person can do about it — lets see some automated >> comparisons between texts. Examples might include: >> >> * this one is longer >> * this one is shorter >> * this one includes more action >> * this one discusses such & such theme more than this one >> * so & so theme came and went during a particular time period >> * the meaning of this phrase changed over time >> * the author’s message of this text is… >> * this given play asserts the following facts >> * here is a map illustrating where the protagonist went when >> * a summary of this text includes… >> * this work is fiction >> * this work is non-fiction >> * this work was probably influenced by… >> >> We don’t need perfect texts before analysis can be done. Sure, perfect >> texts help, but they are not necessary. Observations and generalization can >> be made even without perfectly transcribed texts. >> >> — >> ELM >>
Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database
It may depend on the format of the PDF, but I’ve used the Scraperwiki Python Module ‘pdf2xml’ function to extract text data from PDFs in the past. There is a write up (not by me) at http://schoolofdata.org/2013/08/16/scraping-pdfs-with-python-and-the-scraperwiki-module/ <http://schoolofdata.org/2013/08/16/scraping-pdfs-with-python-and-the-scraperwiki-module/>, and an example of how I’ve used it at https://github.com/ostephens/british_library_directory_of_library_codes/blob/master/scraper.py <https://github.com/ostephens/british_library_directory_of_library_codes/blob/master/scraper.py> Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 18 Jun 2015, at 17:02, Matt Sherman wrote: > > Hi Code4Libbers, > > I am working with colleague on a side project which involves some scanned > bibliographies and making them more web searchable/sortable/browse-able. > While I am quite familiar with the metadata and organization aspects we > need, but I am at a bit of a loss on how to automate the process of putting > the bibliography in a more structured format so that we can avoid going > through hundreds of pages by hand. I am pretty sure regular expressions > are needed, but I have not had an instance where I need to automate > extracting data from one file type (PDF OCR or text extracted to Word doc) > and place it into another (either a database or an XML file) with some > enrichment. I would appreciate any suggestions for approaches or tools to > look into. Thanks for any help/thoughts people can give. > > Matt Sherman
Re: [CODE4LIB] Processing Circ data
Another option might be to use OpenRefine http://openrefine.org - this should easily handle 250,000 rows. I find it good for basic data analysis, and there are extensions which offer some visualisations (e.g. the VIB BITs extension which will plot simple data using d3 https://www.bits.vib.be/index.php/software-overview/openrefine <https://www.bits.vib.be/index.php/software-overview/openrefine>) I’ve written an introduction to OpenRefine available at http://www.meanboyfriend.com/overdue_ideas/2014/11/working-with-data-using-openrefine/ <http://www.meanboyfriend.com/overdue_ideas/2014/11/working-with-data-using-openrefine/> Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 5 Aug 2015, at 21:07, Harper, Cynthia wrote: > > Hi all. What are you using to process circ data for ad-hoc queries. I > usually extract csv or tab-delimited files - one row per item record, with > identifying bib record data, then total checkouts over the given time > period(s). I have been importing these into Access then grouping them by bib > record. I think that I've reached the limits of scalability for Access for > this project now, with 250,000 item records. Does anyone do this in R? My > other go-to- software for data processing is RapidMiner free version. Or do > you just use MySQL or other SQL database? I was looking into doing it in R > with RSQLite (just read about this and sqldf > http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because I'm sure my > IT department will be skeptical of letting me have MySQL on my desktop. > (I've moved into a much more users-don't-do-real-computing kind of > environment). I'm rusty enough in R that if anyone will give me some > start-off data import code, that would be great. > > Cindy Harper > E-services and periodicals librarian > Virginia Theological Seminary > Bishop Payne Library > 3737 Seminary Road > Alexandria VA 22304 > char...@vts.edu<mailto:char...@vts.edu> > 703-461-1794
Re: [CODE4LIB] Protocol-relative URLs in MARC
In theory the 1st indicator dictates the protocol used and 4 =HTTP. However, in all examples on http://www.loc.gov/marc/bibliographic/bd856.html, despite the indicator being used, the protocol part of the URI it is then repeated in the $u field. You can put ‘7’ in the 1st indicator, then use subfield $2 to define other methods. Since only ‘http’ is one of the preset protocols, not https, I guess in theory this means you should use something like 856 70 $uhttps://example.com$2https I’d be pretty surprised if in practice people don’t just do: 856 40 $uhttps://example.com Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 17 Aug 2015, at 21:41, Stuart A. Yeates wrote: > > I'm in the middle of some work which includes touching the 856s in lots of > MARC records pointing to websites we control. The websites are available on > both https://example.org/ and http://example.org/ > > Can I put //example.org/ in the MARC or is this contrary to the standard? > > Note that there is a separate question about whether various software > systems support this, but that's entirely secondary to the question of the > standard. > > cheers > stuart > -- > ...let us be heard from red core to black sky
Re: [CODE4LIB] Protocol-relative URLs in MARC
Sorry - addressing the actual question, rather than the one in my head, the 856 field "is also repeated when more than one access method is used” - so my reading is you should be doing both: 856 40 $uhttp://example.com 856 70 $uhttps://example.com$2https Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 18 Aug 2015, at 00:00, Owen Stephens wrote: > > In theory the 1st indicator dictates the protocol used and 4 =HTTP. However, > in all examples on http://www.loc.gov/marc/bibliographic/bd856.html, despite > the indicator being used, the protocol part of the URI it is then repeated in > the $u field. > > You can put ‘7’ in the 1st indicator, then use subfield $2 to define other > methods. > > Since only ‘http’ is one of the preset protocols, not https, I guess in > theory this means you should use something like > > 856 70 $uhttps://example.com$2https > > I’d be pretty surprised if in practice people don’t just do: > > 856 40 $uhttps://example.com > > Owen > > > Owen Stephens > Owen Stephens Consulting > Web: http://www.ostephens.com > Email: o...@ostephens.com > Telephone: 0121 288 6936 > >> On 17 Aug 2015, at 21:41, Stuart A. Yeates wrote: >> >> I'm in the middle of some work which includes touching the 856s in lots of >> MARC records pointing to websites we control. The websites are available on >> both https://example.org/ and http://example.org/ >> >> Can I put //example.org/ in the MARC or is this contrary to the standard? >> >> Note that there is a separate question about whether various software >> systems support this, but that's entirely secondary to the question of the >> standard. >> >> cheers >> stuart >> -- >> ...let us be heard from red core to black sky >
Re: [CODE4LIB] Job: Wine Loving Developer at University of California, Davis
That may well be true, but ‘getting the job done’ isn’t the only aspect of a crowdsourcing project. It can be used to engage an audience more deeply in the collection and give them some investment in it. This can help with overall visibility of the collection on the web (through those people who have engaged sharing what they are doing/seeing etc.), and future use, and be a platform for further projects. A project like this could also offer a way of experimenting with crowdsourcing in a low risk way. And of course the developer is needed for the visualisation aspect anyway, so the recruitment needs to happen and a wage needs to be paid anyway ... Whether all this balances out against the economics/efficiency of getting the job done in the cheapest possible way is a judgement that needs to be made, but I don’t think the simple economic argument is the only one in play here. Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 10 Dec 2015, at 23:42, James Morley wrote: > > I agree with Thomas's logic, if not the maths (surely $2,000?) > > I was going to do a few myself but it looks like comments have been disabled > on the Flickr images? > > > From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Thomas > Krichel [kric...@openlib.org] > Sent: 10 December 2015 23:17 > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Job: Wine Loving Developer at University of > California, Davis > > j...@code4lib.org writes > > >> **PROJECT DETAILS** >> The UC Davis University Library is launching a project to digitize the >> [Amerine wine label >> collection](https://www.flickr.com/photos/brantley/sets/72 >> 157655817440104/with/21116552632/) > > Some look like hard to read. > >> and engage the public to transcribe the information contained on the >> labels and associated annotations. > > This may take a long time. I suggest rather than doing that, take > somebody in a low-income country who speaks French, say, and who will > type all the data in. That way you get consistency in the data. I > live in Siberia, I can find somebody there. Once this data is in a > simple text file, you can use in-house staff to attach it to the > label images in your systems. > > Crowdsource sounds cool, but for 4000 label it makes no sense. > If the typist gets $10/h, and gets 20 labels done in 1h, we > are talking $200. The visit you are planning for your developer > will cost that much. > -- > > Cheers, > > Thomas Krichel http://openlib.org/home/krichel > skype:thomaskrichel
Re: [CODE4LIB] searching metadata vs searching content
To share the practice from a project I work on - the Jisc Historical Texts platform[1] which provides searching across digitised texts from the 16th to 19th centuries. In this case we had the option to build the search application from scratch, rather than using a product such as ContentDM etc. I should say that all the technical work was done by K-Int [2] and Gooii [3], I was there to advise on metadata and user requirements, and so the following is based on my understanding of how the system works, and any errors are down to me :) There are currently three major collections within the Historical Texts platform, with different data sources behind each one. In general the data we have for each collection consists of MARC metadata records, full text in XML documents (either from transcription or from OCR processes) and image files of the pages. The platform is build using the ElasticSearch [4] (ES) indexing software (as with Solr this is built on top of Lucene). We structure the data we index in ES in two layers - the ‘publication’ record, which is essentially where all the MARC metadata lives (although not as MARC - we transform this to an internal scheme), and the ‘page’ records - one record per page in the item. The text content lives in the page record, along with links to the image files for the page. The ‘page’ records are all what ES calls ‘child’ records of the relevant publication record. We make this relationship through shared IDs in the MARC records and the XML fulltext documents. We create a whole range of indexes from this data. Obviously field specific searchs like title or author only search the relevant metadata fields. But we also have a (default) ’search all’ option which searches through all the metadata and fulltext. If the user wants to search the text only, they check an option and we limit the search to only text from records of the ‘page’ type. The results the user gets initially are always the publication level records - so essentially your results list is a list of books. For each result you can view ‘matches in text’ which shows snippets of where your search term appears in the fulltext. You can then either click to view the whole book, or click the relevant page from the list of snippets. When you view the book, the software retrieves all the ‘page’ records for the book, and from the page records can retrieve the image files. When the user goes to the book viewer, we also carry over the search terms from their search, so they can see the same text snippets of where the terms appear alongside the book viewer - so the user can navigate to the pages which contain the search terms easily. For more on the ES indexing side of this, Rob Tice from Knowledge Integration did a talk about the use of ES in this context at the London Elasticsearch usergroup [5]. Unfortunately the interface itself requires a login, but if you want to get a feel for how this all works in the UI, there is also a screencast which gives an overview of the UI available [6]. Best wishes, Owen 1. https://historicaltexts.jisc.ac.uk 2. http://www.k-int.com 3. http://www.gooii.com 4. https://www.elastic.co 5. http://www.k-int.com/Rob-Tice-Elastic-London-complex-modelling-of-rich-text-data-in-Elasticsearch 6. http://historicaltexts.jisc.ac.uk/support Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 > On 27 Jan 2016, at 00:30, Laura Buchholz wrote: > > Hi all, > > I'm trying to understand how digital library systems work when there is a > need to search both metadata and item text content (plain text/full text), > and when the item is made up of more than one file (so, think a digitized > multi-page yearbook or newspaper). I'm not looking for answers to a > specific problem, really, just looking to know what is the current state of > community practice. > > In our current system (ContentDM), the "full text" of something lives in > the metadata record, so it is indexed and searched along with the metadata, > and essentially treated as if it were metadata. (Correct?). This causes > problems in advanced searching and muddies the relationship between what is > typically a descriptive metadata record and the file that is associated > with the record. It doesn't seem like a great model for the average digital > library. True? I know the answer is "it depends", but humor me... :) > > If it isn't great, and there are better models, what are they? I was taught > METS in school, and based on that, I'd approach the metadata in a METS or > METS-like fashion. But I'm unclear on the steps from having a bunch of METS > records that include descriptive metadata and pointers to text files of the > OCR (we don't, but if we did...) to indexing and providing results to > users. I think an
Re: [CODE4LIB] Directories of OAI-PMH repositories
Also see OpenDOAR http://www.opendoar.org We used this listing when building Core http://core.kmi.open.ac.uk/search - which aggregates and does full-text analysis and similarity matching across OA repositories Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 7 Feb 2013, at 23:19, Wilhelmina Randtke wrote: > Thanks! The list of lists is very helpful. > > -Wilhelmina Randtke > > On Thu, Feb 7, 2013 at 2:40 PM, Habing, Thomas Gerald > wrote: > >> Here is a registry of OAI-PMH repositories that we maintain (sporadically) >> here at Illinois: http://gita.grainger.uiuc.edu/registry/ >> >> Tom >> >>> -Original Message- >>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of >>> Phillips, Mark >>> Sent: Thursday, February 07, 2013 2:13 PM >>> To: CODE4LIB@LISTSERV.ND.EDU >>> Subject: Re: [CODE4LIB] Directories of OAI-PMH repositories >>> >>> You could start here. >>> >>> http://www.openarchives.org/pmh/ >>> >>> Mark >>> >>> From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of >>> Wilhelmina Randtke [rand...@gmail.com] >>> Sent: Thursday, February 07, 2013 2:03 PM >>> To: CODE4LIB@LISTSERV.ND.EDU >>> Subject: [CODE4LIB] Directories of OAI-PMH repositories >>> >>> Is there a central listing of places that track and list OAI-PMH >> repository >>> feeds? I have an OAI-PMH compliant repository, so now am looking for >>> places to list that so that harvesters or anyone who is interested can >> find it. >>> >>> -Wilhelmina Randtke >>
Re: [CODE4LIB] You *are* a coder. So what am I?
"Shambrarian": Someone who knows enough truth about how libraries really work, but not enough to go insane or be qualified as a real librarian. (See more at http://m.urbandictionary.com/#define?term=Shambrarian) More information available at http://shambrarian.org/ And Dave Pattern has published a handy guide to Librarian/Shambrarian interactions ("DO NOT bore the librarian by showing them your Roy Tennant Fan Club membership card") http://daveyp.wordpress.com/2011/07/21/librarianshambrarian-venn-diagram/ Tongue firmly in cheek, Owen On 14 Feb 2013, at 00:22, Maccabee Levine wrote: > Andromeda's talk this afternoon really struck a chord, as I shared with her > afterwards, because I have the same issue from the other side of the fence. > I'm among the 1/3 of the crowd today with a CS degree and and IT > background (and no MLS). I've worked in libraries for years, but when I > have a point to make about how technology can benefit instruction or > reference or collection development, I generally preface it with "I'm not a > librarian, but...". I shouldn't have to be defensive about that. > > Problem is, 'coder' doesn't imply a particular degree -- just the > experience from doing the task, and as Andromeda said, she and most C4Lers > definitely are coders. But 'librarian' *does* imply MLS/MSLS/etc., and I > respect that. > > What's a library word I can use in the same way as coder? > > Maccabee > > -- > Maccabee Levine > Head of Library Technology Services > University of Wisconsin Oshkosh > levi...@uwosh.edu > 920-424-7332
[CODE4LIB] British Library Directory of Libraries (probably of interest to UK only)
The British Library has a directory of library codes used by UK registered users of it's Document Supply service. The Directory of Library Codes enables British Library customers to convert into names and addresses the library codes they are given in response to location searches. It also indicates each library's supply and charging policies. More information at http://www.bl.uk/reshelp/atyourdesk/docsupply/help/replycodes/dirlibcodes/ As far as I know the only format this data has ever been made available in is PDF. I've always thought this a shame, so I've written a scraper on scraperwiki to extract the data from the PDF and make it available as structured, query-able, data. The scraper and output is at https://scraperwiki.com/scrapers/british_library_directory_of_library_codes/ Just in case anyone would find it useful. Also any suggestions for improving the scraper welcome (I don't usually write Python so the code is probably even ropier than my normal code :) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
Re: [CODE4LIB] DOI scraping
I'd say yes to the investment in jQuery generally - not too difficult to get the basics if you already use javascript, and makes some things a lot easier It sounds like you are trying to do something not dissimilar to LibX http://libx.org ? (except via bookmarklet rather than as a browser plugin). Also looking for custom database scrapers it might be worth looking at Zotero translators, as they already exist for many major sources and I guess will be grabbing the DOI where it exists if they can http://www.zotero.org/support/dev/translators Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 17 May 2013, at 05:32, "Fitchett, Deborah" wrote: > Kia ora koutou, > > I’m wanting to create a bookmarklet that will let people on a journal article > webpage just click the bookmarklet and get a permalink to that article, > including our proxy information so it can be accessed off-campus. > > Once I’ve got a DOI (or other permalink, but I’ll cross that bridge later), > the rest is easy. The trouble is getting the DOI. The options seem to be: > > 1. Require the user to locate and manually highlight the DOI on the > page. This is very easy to code, not so easy for the user who may not even > know what a DOI is let alone how to find it; and some interfaces make it hard > to accurately select (I’m looking at you, ScienceDirect). > > 2. Live in hope of universal CoiNS implementation. I might be waiting a > long time. > > 3. Work out, for each database we use, how to scrape the relevant > information from the page. Harder/tedious to code, but makes it easy for the > user. > > I’ve been looking around for existing code that something like #3. So far > I’ve found: > > · CiteULike’s bookmarklet (jQuery at http://www.citeulike.org/bm - > afaik it’s all rights reserved) > > · AltMetrics’ bookmarklet (jQuery at > http://altmetric-bookmarklet.dsci.it/assets/content.js - MIT licensed) > > Can anyone think of anything else I should be looking at for inspiration? > > Also on a more general matter: I have the general level of Javascript that > one gets by poking at things and doing small projects and then getting > distracted by other things and then coming back some months later for a > different small project and having to relearn it all over again. I’ve long > had jQuery on my “I guess I’m going to have to learn this someday but, um, > today I just wanna stick with what I know” list. So is this the kind of thing > where it’s going to be quicker to learn something about jQuery before I get > started, or can I just as easily muddle along with my existing limited > Javascript? (What really are the pros and cons here?) > > Nāku noa, nā > > Deborah Fitchett > Digital Access Coordinator > Library, Teaching and Learning > > p +64 3 423 0358 > e deborah.fitch...@lincoln.ac.nz<mailto:deborah.fitch...@lincoln.ac.nz> | w > library.lincoln.ac.nz<http://library.lincoln.ac.nz/> > > Lincoln University, Te Whare Wānaka o Aoraki > New Zealand's specialist land-based university > > > > P Please consider the environment before you print this email. > "The contents of this e-mail (including any attachments) may be confidential > and/or subject to copyright. Any unauthorised use, > distribution, or copying of the contents is expressly prohibited. If you > have received this e-mail in error, please advise the sender > by return e-mail or telephone and then delete this e-mail together with all > attachments from your system." >
Re: [CODE4LIB] best way to make MARC files available to anyone
Putting the files on GitHub might be an option - free for public repositories, and 38Mb should not be a problem to host there Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 12 Jun 2013, at 02:24, Dana Pearson wrote: > I have crosswalked the Project Gutenberg RDF/DC metadata to MARC. I would > like to make these files available to any library that is interested. > > I thought that I would put them on my website via FTP but don't know if > that is the best way. Don't have an ftp client myself so was thinking that > that may be now passé. > > I tried using Google Drive with access available via the link to two > versions of the files, UTF8 and MARC8. However, it seems that that is not > a viable solution. I can access the files with the URLs provided by > setting the access to anyone with the URL but doesn't work for some of > those testing it for me or with the links I have on my webpage.. > > I have five folders with files of about 38 MB total. I have separated the > ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts > such as Chinese, Modern Greek. Most of the content is in the ebook folder. > > I would like to make access as easy as possible. > > Google Drive seems to work for me. Here's the link to my page with the > links in case you would like to look at the folders. Works for me but not > for everyone who's tried it. > > http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html > > thanks, > dana > > -- > Dana Pearson > dbpearsonmlis.com
Re: [CODE4LIB] best way to make MARC files available to anyone
On 12 Jun 2013, at 14:06, Dana Pearson wrote: > Thanks for the replies..I had looked at GitHub but thought it something > different, ie, collaborative software development...I will look again Yes - that's the main use (git is version control software, GitHub hosts git repositories) - but of course git doesn't care what types of files you have under version control. It came to mind because I know it's been used to distribute metadata files before - e.g. this set of metadata from the Cooper Hewitt National Design Museum https://github.com/cooperhewitt/collection There could be some additional benefits gained through using git to version control this type of file, and GitHub to distribute them if you were interested, but it can act as simply a place to put the files and make them available for download. But of course the other suggestions would do this simpler task just as well. Owen
Re: [CODE4LIB] best way to make MARC files available to anyone
On 13 Jun 2013, at 02:57, Dana Pearson wrote: > quick followup on the thread.. > > github: I looked at the cooperhewitt collection but don't see a way to > download the content...I could copy and paste their content but that may > not be the best approach for my files...documentation is thin, seems i > would have to provide email addresses for those seeking access...but > clearly that is not the case with how the cooperhewitt archive is > configured.. > > My primary concern has been to make it as simple a process as possible for > libraries which have limited technical expertise. I suspect from what you say that GitHub is not what you want in this case. However, I just wanted to clarify that you can download files as a Zip file (e.g. for Cooper Hewitt https://github.com/cooperhewitt/collection/archive/master.zip), and that this link is towards the top left on each screen in GitHub. The repository is a public one (which is the default, and only option unless you have a paid account on GitHub) and you do not need to provide email addresses or anything else to access the files on a public repository Owen
Re: [CODE4LIB] Anyone have access to well-disambiguated sets of publication data?
I'd echo the other comments that finding reliable data is problematic but as a suggestion of reasonably good data you could try: Names was a Jisc funded project that as far as I know isn't currently active but the data available should be of reasonable quality I think. More details on the project available at http://names.mimas.ac.uk/files/Final_Report_Names_Phase_Two_September_2011.pdf Names: for author names + identifiers - e.g. http://names.mimas.ac.uk/individual/25256.html?&outputfields=identifiers (this one has an ISNI) Names also provides links to Journal articles - e.g. for same person http://names.mimas.ac.uk/individual/25256.html?&outputfields=resultpublications You could then use the Crossref DOI lookup service to get journal identifiers Not sure this will get you what you need but might be worth a look Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 9 Jul 2013, at 16:32, Paul Albert wrote: > I am exploring methods for author disambiguation, and I would like to have > access to one or more set of well-disambiguated data set containing: > – a unique author identifier (email address, institutional identifier) > – a unique article identifier (PMID, DOI, etc.) > – a unique journal identifier (ISSN) > > Definition for "well-disambiguated" – for a given set of authors, you know > the identity of their journal articles to a precision and recall of greater > than 90-95%. > > Any ideas? > > thanks, > Paul > > > Paul Albert > Project Manager, VIVO > Weill Cornell Medical Library > 646.962.2551
Re: [CODE4LIB] Releasing library holdings metadata openly on the web (was: Libraries and IT Innovation)
On the holdings front also see the work being done on a holding ontology at https://github.com/dini-ag-kim/holding-ontology (and related mailing list http://lists.d-nb.de/mailman/listinfo/dini-ag-kim-bestandsdaten) - discussion all in English Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 23 Jul 2013, at 21:14, Dan Scott wrote: > Hi Laura: > > On Tue, Jul 23, 2013 at 12:36 PM, Laura Krier wrote: > > > >> The area where I'm most involved right now is in releasing library holdings >> metadata openly on the web, in discoverable and re-usable forms. It's >> amazing to me that we still don't do this. Imagine the things that could be >> created by users and software developers if they had access to information >> about which libraries hold which resources. > > I'm really interested in your efforts on this front, and where this > work is taking place, as that's what I'm trying to do as part of my > participation in the W3C Schema Bib Extend Community Group at > http://www.w3.org/community/schemabibex/ > > See the thread starting around > http://lists.w3.org/Archives/Public/public-schemabibex/2013Jul/0068.html > where we're trying to work out how best to surface library holdings in > schema.org structured data, with one effort focusing on reusing the > "Offer" class. There are many open questions, of course, but one of > the end goals (at least for me) is to get the holdings into a place > where regular people are most likely to find them: in search results > served up by search engines like Google and Bing. > > If you're not involved in the W3C community group, maybe you should > be! And it would be great if you could point out where your work is > taking place so that we can combine forces. > > Dan
Re: [CODE4LIB] netflix search mashups w/ library tools?
From the Netflix API Terms of Use "Titles and Title Metadata may be stored for no more than twenty four (24) hours." http://developer.netflix.com/page/Api_terms_of_use Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 19 Aug 2013, at 16:59, Ken Irwin wrote: > Thanks Karen, > > This goes in a bit of a direction from what I'm hoping for and your project > does suggest that some matching to build such searches might be possible. > > What I really want is to apply LCSH and related data to the Netflix search > process, essentially dropping Netflix holdings into a library catalog > interface. I suspect you'd have to build a local cache of the OCLC data for > known Netflix items to do so, and maybe a local cache of the Netflix title > list. I wonder if either or both of those actions would violate the TOS for > the respective services. > > Ken > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen > Coombs > Sent: Monday, August 19, 2013 11:26 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] netflix search mashups w/ library tools? > > Ken, > > I did a mashup that took Netflix's top 100 movies and looked to see if a > specific library had that item. > http://www.oclc.org/developer/applications/netflix-my-library > > You might think about doing the following. Search WorldCat for titles on a > particular topic and then check to see if the title is available via Netflix. > Netflix API for searching their catalog is pretty limited though so it might > not give you what you want. It looks like it only allows you to search their > streamable content. > > Also I had a lot of trouble with trying to match Netflix titles and library > holdings. Because there isn't a good match point. DVDs don't have ISBNs and > if you use title you can get into trouble because movies get remade. So title > + date seems to work best if you can get the information. > > Karen > > On Mon, Aug 19, 2013 at 8:54 AM, Ken Irwin wrote: >> Hi folks, >> >> Is anyone out there using library-like tools for searching Netflix? I'm >> imagining a world in which Netflix data gets mashed up with OCLC data or >> something like it to populate a more robustly searchable Netflix title list. >> >> Does anything like this exist? >> >> What I really want at the moment is a list of Netflix titles dealing with >> Islamic topics (Muhammed, the Qu'ran, the history of Islamic civilizations, >> the Hajj, Ramadan, etc.) for doing beyond-the-library readers' advisory in >> connection with our ALA/NEH Muslim Journey's Bookshelf. Netflix's own search >> tool is singularly awful, and I thought that the library world might have an >> interest in doing better. >> >> Any ideas? >> Thanks >> Ken
Re: [CODE4LIB] What do you want to learn about linked data?
Just a recommendation for a source of information - I've found http://linkeddatabook.com/editions/1.0/ very useful especially in thinking about the practicalities of linked data publication and consumption in applications Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 4 Sep 2013, at 15:13, "Akerman, Laura" wrote: > Karen, > > It's hard to say what "basics" are. We had a learning group at Emory that > covered a lot of the "what is it", including mostly what you've listed but > also the environment (library and cultural heritage, and larger environment), > but we had a harder time getting to the "what do you do with it" which is > what would really motivate and empower people to go ahead and get beyond > basics. > > Maybe add: > > How do you embed linked data in web pages using RDFa > (Difference between RDFa and schema.org/other microdata) > How do you harvest linked data from web pages, endpoints, or other modes of > delivery? > Different serializations and how to convert > How do you establish relations between different "vocabularies" (classes and > properties) using RDFS and OWL? > (Demo) New answers to your questions enabled by combining and querying linked > data! > > Maybe a step toward "what can you do with it" would be to show (or have an > exercise): > > How can a web application interface with linked data? > > I suspect there are a lot of people who've read about it and/or have had > tutorials here and there, and who really want to get their hands in it. > That's where there's a real dearth of training. > > An "intermediate level" workshop addressing (but not necessarily answering!) > questions like: > > Do you need a triplestore or will a relational database do? > Do you need to store your data as RDF or can you do everything you need with > XML or some other format, converting on the way out or in? > Should you query external endpoints in real time in your application, or > cache the data? > Other than SPARQL, how do you "search" linked data? Indexing strategies... > tools... > If asserting OWL "sameAs" is too dangerous in your context, what other > strategies for expressing "close to it" relationships between resources > (concepts) might work for you? > Advanced SPARQL using regular expressions, CREATE, etc. > Care and feeding of triplestores (persistence, memory, ) > Costing out linked data applications: > How much additional server space and bandwidth will I (my institution) need > to provision in order to work with this stuff? > Open source, "free", vs. commercial management systems? > Backward conversion -transformations from linked data to other data > serializations (e.g. metadata standards in XML). > What else? > > Unfortunately (or maybe just, how it is) no one has built an interface that > hides all the programming and technical details from people but lets them > experience/experiment with this stuff (have they?). So some knowledge is > necessary. What are prerequisites and how could we make the burden of > knowing them not so onerous to people who don't have much experience in web > programming or system administration, so they could get value from a > tutorial,? > > Laura > > Laura Akerman > Technology and Metadata Librarian > Room 208, Robert W. Woodruff Library > Emory University, Atlanta, Ga. 30322 > (404) 727-6888 > lib...@emory.edu > > > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen > Coyle > Sent: Wednesday, September 04, 2013 4:59 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] What do you want to learn about linked data? > > All, > > I had a few off-list requests for basics - what are the basic things that > librarians need to know about linked data? I have a site where I am putting > up a somewhat crudely designed tutorial (with exercises): > > http://kcoyle.net/metadata/ > > As you can see, it is incomplete, but I work away on it when so inspired. It > includes what I consider to be the basic knowledge: > > 1. What is metadata? > 2. Data vs. text > 3. Identifiers (esp. URIs) > 4. Statements (not records) (read: triples) 5. Semantic Web basics 6. URIs > (more in depth) 7. Ontologies 8. Vocabularies > > I intend to link various slide sets to this, and anyone is welcome to make > use of the content there. It would be GREAT for it to become an actual > tutorial, perhaps using better software, but I ha
Re: [CODE4LIB] Open Source ERM
I'm involved in the GOKb project, and also a related project in the UK called 'KB+' which is a national service providing a knowledgebase and the ability to manage subscriptions/licences. As Adam said - GOKb is definitely more of a service, although the software could be run by anyone it isn't designed with ERM functionality in mind - but to be able to be a GOKb is a community managed knowledgebase - and so far much of the work has been to build a set of tools for bringing in data from publishers and content providers, and to store and manage that data. In the not too distant future GOKb will provide data via APIs for use in downstream systems. Two specific downstream systems GOKb is going to be working with are the Kuali OLE system (https://www.kuali.org/ole) and the KB+ system mentioned above. KB+ started with very similar ideas to GOKb in terms of building a community managed knowledgebase, but with the UK HE community specifically in mind. However it is clear that collaborating with GOKb will have significant benefits and help the community focus its effort in a single knowledgebase, and so it is expected that eventually KB+ will consume data from GOKb, and the community will contribute to the data managed in GOKb. However KB+ also provides more ERM style functionality available to UK Universities. Each institution can setup its own subscriptions and licenses, drawing on the shared knowledgebase information which is managed centrally by a team at Jisc Collections (who negotiate licenses for much of the content in the UK, among other things). I think the KB+ software could work as a standalone ERMs in terms of functionality, but its strength is as a multi-institution system with a shared knowledgebase. We are releasing v3.3 next week which brings integration with various discussion forum software - hoping we can put community discussion and collaboration at the heart of the product Development on both KB+ and GOKb is being done by a UK software house called Knowledge Integration, and while licenses for the respective code bases have not yet been implemented, both should be released under an open licence in the future. However the code is already on Github if anyone is interested http://github.com/k-int/KBPlus/ https://github.com/k-int/gokb-phase1 In both cases they are web apps written in Groovy. GOKb has the added complication/interest of also having a Open (was Google) Refine extension as this is the tool chose for loading messing e-journal data into the system Sorry to go on, hope the above is of some interest Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 20 Sep 2013, at 16:26, Karl Holten wrote: > A couple of months ago our organization began looking at new ERM solutions / > link resolvers, so I thought I'd share my thoughts based on my research of > the topic. Unfortunately, I think this is one area where open source > offerings are a bit thin. Many offerings look promising at first but are no > longer under development. I'd be careful about adopting something that's no > longer supported. Out of all the options that are no longer developed, I > thought the CUFTS/GODOT combo was the most promising. Out of the options that > seem to still be under development, there were two options that stood out: > CORAL and GOKb. Neither includes a link resolver, so they weren't good for > our needs. CORAL has the advantage of being out on the market right now. GOKb > is backed by some pretty big institutions and looks more sophisticated, but > other than some slideshows there's not a lot to look at to actually evaluate > it at the moment. > > Ultimately, I came to the conclusion that nothing out there right now matches > the proprietary software, especially in terms of link resolvers and in terms > of a knowledge base. If I were forced to go open source I'd say the GOKb and > CORAL look the most promising. Hope that helps narrow things down at least a > little bit. > > Regards, > Karl Holten > Systems Integration Specialist > SWITCH Consortium > 6801 North Yates Road > Milwaukee, WI 53217 > http://topcat.switchinc.org/ > > > > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Riesner, Giles W. > Sent: Thursday, September 19, 2013 5:33 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Open Source ERM > > Thank you, Peter. I took a quick look at the list and found ERMes there as > well as a few others. > Not everything under this category really fits what I'm looking for (e.g. > Calibre). I'll look a little deeper. > > Regards, > > > Giles W. Riesner, Jr., Lead Library Technician, Library Technolo
Re: [CODE4LIB] Library of Congress
+1 Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 1 Oct 2013, at 14:21, "Doran, Michael D" wrote: >> As far as I can tell the LOC is up and the offices are closed. HORRAY!! >> Let's celebrate! > > Before we start celebrating, let's consider our friends and colleagues at the > LOC (some of who are code4lib people) who aren't able to work and aren't > getting paid starting today. > > -- Michael > > # Michael Doran, Systems Librarian > # University of Texas at Arlington > # 817-272-5326 office > # 817-688-1926 mobile > # do...@uta.edu > # http://rocky.uta.edu/doran/ > >> -Original Message- >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of >> Riley Childs >> Sent: Tuesday, October 01, 2013 5:28 AM >> To: CODE4LIB@LISTSERV.ND.EDU >> Subject: [CODE4LIB] Library of Congress >> >> As far as I can tell the LOC is up and the offices are closed. HORRAY!! >> Let's celebrate! >> >> Riley Childs >> Junior and Library Tech Manager >> Charlotte United Christian Academy >> +1 (704) 497-2086 >> Sent from my iPhone >> Please excuse mistakes
Re: [CODE4LIB] usability question: searching for a database (not in a database)
Agree with others about user testing, but from my experience it is better to get the application to react intelligently to what us typed in than to expect to control what a user is going to enter. Type-ahead suggestions may help, but I'm a fan of adding a bit of intelligence to the search app - if they type in something that finds a hit in your database a-z, promote those in your results screen - perhaps 'featured results' above federated search results etc. Also alongside usability testing, keep looking at what is actually being searched via the log files, and adjust over time as necessary. Owen On 30 Jul 2010, at 13:22, Sarah Weeks wrote: > Long time lurker, first time poster. > I have a little usability question I was hoping someone could give me advice > on. > I'm updating the databases page on our website and we'd like to add a search > box that would search certain fields we have set up for our databases > (title, vendor, etc...) so that even if someone doesn't remember the first > word in the title, they can quickly find the database they're looking > through without having to scroll through the whole A-Z list. > My question is: if we add a search box to our main database page, how can we > make it clear that it's for searching FOR a database and not IN a database? > Some of the choices we've considered are: > Seach for a database: > Search this list: > Don't remember the name of the database? Search here: > > I'm not feeling convinced by any of them. I'm afraid when people see a > search box they're not going to bother reading the text but will just assume > it's a federated search tool. > > Any advice? > > -Sarah Beth > > -- > Sarah Beth Weeks > Interim Head Librarian of Technical Services and Systems > St Olaf College Rolvaag Memorial Library > 1510 St. Olaf Avenue > Northfield, MN 55057 > 507-786-3453 (office) > 717-504-0182 (cell)
[CODE4LIB] Linking Sakai 'Citation Helper' to other systems
I'm part of a project at Oxford University in the UK that is looking at how we can enhance the 'Citation Helper' module in Sakai (which is used to provide the Oxford learning environment 'WebLearn') - enabling faculty members to add resources from the Oxford 'resource discovery' solution SOLO (Primo, from Ex Libris) and displaying holdings/availability information alongside items in the resource lists. I've just blogged some more information about the project (from my own point of view), outlining the approach we are taking. We are aiming to achieve the integrations through a 'loosely coupled' approach making use of common standards/specifications including: OpenURL COinS Juice framework DLF-ILS GetAvailability DAIA (possibly) All this should be mean that what we do at Oxford can be easily transferred to other environments. There is a lot more detail in the blog post at http://www.meanboyfriend.com/overdue_ideas/2010/08/sir-louie/, and I'd really welcome comments/suggestions/issues/questions to inform the project as we start developing the solutions for Sir Louie. Thanks, Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
Re: [CODE4LIB] Linking Sakai 'Citation Helper' to other systems
Thanks Karen - yes, I'm following the work of the group :) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 23 Aug 2010, at 16:54, Karen Coombs wrote: > Owen, > > It is probably worth looking at the work of ILS Interoperability group > which is has adopted the XC NCIP Toolkit to try to provide access to > ILS data like Availability. The group is very active right now and > trying to get people to build connectors for the toolkit and improve > existing code. If you are dealing with ALEPH I believe there is a > connector for ALEPH in the current version of the XC NCIP toolkit. > > You can check out our activities at http://groups.google.com/group/ils-di > > Karen > > On Mon, Aug 23, 2010 at 6:48 AM, Owen Stephens wrote: >> I'm part of a project at Oxford University in the UK that is looking at how >> we can enhance the 'Citation Helper' module in Sakai (which is used to >> provide the Oxford learning environment 'WebLearn') - enabling faculty >> members to add resources from the Oxford 'resource discovery' solution SOLO >> (Primo, from Ex Libris) and displaying holdings/availability information >> alongside items in the resource lists. >> >> I've just blogged some more information about the project (from my own point >> of view), outlining the approach we are taking. We are aiming to achieve the >> integrations through a 'loosely coupled' approach making use of common >> standards/specifications including: >> >> OpenURL >> COinS >> Juice framework >> DLF-ILS GetAvailability >> DAIA (possibly) >> >> All this should be mean that what we do at Oxford can be easily transferred >> to other environments. There is a lot more detail in the blog post at >> http://www.meanboyfriend.com/overdue_ideas/2010/08/sir-louie/, and I'd >> really welcome comments/suggestions/issues/questions to inform the project >> as we start developing the solutions for Sir Louie. >> >> Thanks, >> >> Owen >> >> >> Owen Stephens >> Owen Stephens Consulting >> Web: http://www.ostephens.com >> Email: o...@ostephens.com >> Telephone: 0121 288 6936 >>
[CODE4LIB] Help with DLF-ILS GetAvailability
I'm working with the University of Oxford to look at integrating some library services into their VLE/Learning Management System (Sakai). One of the services is something that will give availability for items on a reading list in the VLE (the Sakai 'Citation Helper'), and I'm looking at the DLF-ILS GetAvailability specification to achieve this. For physical items, the availability information I was hoping to use is expressed at the level of a physical collection. For example, if several college libraries within the University I have aggregated information that tells me the availability of the item in each of the college libraries. However, I don't have item level information. I can see how I can use simpleavailability to say over the entire institution whether (e.g.) a book is available or not. However, I'm not clear I can express this in a more granular way (say availability on a library by library basis) except by going to item level. Also although it seems you can express multiple locations in simpleavailability, and multiple availabilitymsg, there is no way I can see to link these, so although I could list each location OK, I can't attach an availabilitymsg to a specific location (unless I only express one location). Am I missing something, or is my interpretation correct? Any other suggestions? Thanks, Owen PS also looked at DAIA which I like, but this (as far as I can tell) only allows availabitlity to be specified at the level of items Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
Re: [CODE4LIB] Help with DLF-ILS GetAvailability
Sorry Jonathan - meant to say thanks - and that your blog posts were already my 'required reading' for doing anything with ils-di stuff! Owen On Wed, Oct 20, 2010 at 8:35 PM, Jonathan Rochkind wrote: > I believe you are correct. The ils-di stuff is just kind of a framework > starting point, not (yet) a complete end-to-end standards-constrained > solution. > > I believe you will find my thoughts and experiences on this issue helpful. > My own circumstances did not involve collection-level anything, but I still > ended up using an unholy mish-hash of several abused metadata formats to > express what I needed. > > http://bibwild.wordpress.com/2009/09/10/dlf-ils-di-dlfexpanded-service-for-horizon/ > > > http://bibwild.wordpress.com/2009/07/31/exposing-holdings-in-dlf-ils-di-standard-format-web-service/ > > > > > Owen Stephens wrote: > >> I'm working with the University of Oxford to look at integrating some >> library services into their VLE/Learning Management System (Sakai). One of >> the services is something that will give availability for items on a reading >> list in the VLE (the Sakai 'Citation Helper'), and I'm looking at the >> DLF-ILS GetAvailability specification to achieve this. >> >> For physical items, the availability information I was hoping to use is >> expressed at the level of a physical collection. For example, if several >> college libraries within the University I have aggregated information that >> tells me the availability of the item in each of the college libraries. >> However, I don't have item level information. >> I can see how I can use simpleavailability to say over the entire >> institution whether (e.g.) a book is available or not. However, I'm not >> clear I can express this in a more granular way (say availability on a >> library by library basis) except by going to item level. Also although it >> seems you can express multiple locations in simpleavailability, and multiple >> availabilitymsg, there is no way I can see to link these, so although I >> could list each location OK, I can't attach an availabilitymsg to a specific >> location (unless I only express one location). >> Am I missing something, or is my interpretation correct? >> Any other suggestions? >> Thanks, >> Owen >> PS also looked at DAIA which I like, but this (as far as I can tell) only >> allows availabitlity to be specified at the level of items >> >> >> Owen Stephens >> Owen Stephens Consulting >> Web: http://www.ostephens.com >> Email: o...@ostephens.com >> Telephone: 0121 288 6936 >> >> >> > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] Help with DLF-ILS GetAvailability
Thanks Dave, Yes - my reading was that dlf:holdings was for pure 'holdings' as opposed to 'availability'. We could put the simpleavailability in there I guess but as you say since we are controlling both ends then there doesn't seem any point in abusing it like that. The downside is we'd hoped to do something that could be taken by other sites - the original plan was to use the Juice framework - developed by Talis using jQuery to parse a standard availability format so that this could then be applied easily in other environments. Obviously we can still achieve the outcome we need for the immediate requirements of the project by using a custom format. Thanks again Owen On Thu, Oct 21, 2010 at 4:28 PM, Walker, David wrote: > Hey Owen, > > Seems like the you could use the element to hold this kind > of individual library information. > > The DLF-ILS documentation doesn't seem to think that you would use > dlf:simpleavailability here, though, but rather MARC or ISO holdings > schemas. > > But if you're controlling both ends of the communication, I don't know if > it really matters. > > --Dave > > == > David Walker > Library Web Services Manager > California State University > http://xerxes.calstate.edu > ________ > From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Owen > Stephens [o...@ostephens.com] > Sent: Wednesday, October 20, 2010 12:22 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: [CODE4LIB] Help with DLF-ILS GetAvailability > > I'm working with the University of Oxford to look at integrating some > library services into their VLE/Learning Management System (Sakai). One of > the services is something that will give availability for items on a reading > list in the VLE (the Sakai 'Citation Helper'), and I'm looking at the > DLF-ILS GetAvailability specification to achieve this. > > For physical items, the availability information I was hoping to use is > expressed at the level of a physical collection. For example, if several > college libraries within the University I have aggregated information that > tells me the availability of the item in each of the college libraries. > However, I don't have item level information. > > I can see how I can use simpleavailability to say over the entire > institution whether (e.g.) a book is available or not. However, I'm not > clear I can express this in a more granular way (say availability on a > library by library basis) except by going to item level. Also although it > seems you can express multiple locations in simpleavailability, and multiple > availabilitymsg, there is no way I can see to link these, so although I > could list each location OK, I can't attach an availabilitymsg to a specific > location (unless I only express one location). > > Am I missing something, or is my interpretation correct? > > Any other suggestions? > > Thanks, > > Owen > > PS also looked at DAIA which I like, but this (as far as I can tell) only > allows availabitlity to be specified at the level of items > > > Owen Stephens > Owen Stephens Consulting > Web: http://www.ostephens.com > Email: o...@ostephens.com > Telephone: 0121 288 6936 > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] Help with DLF-ILS GetAvailability
OK - thanks both will pursue this - taking on board Jonathan's points on the issues around this Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 21 Oct 2010, at 22:07, Walker, David wrote: >> Yes - my reading was that dlf:holdings was for pure 'holdings' >> as opposed to 'availability'. > > I would agree with Jonathan that putting a summary of item availability in > is not an abuse. > > For example, ISO Holdings -- one of the schemas the DLF-ILS documents > suggests using here -- has elements for things like: > > > > > > Very much the kind of summary information you are using. Those are different > from it's element, which describes individual > items. > > So IMO it wouldn't be (much of) a stretch to express this in > dlf:simpleavailability instead. > > --Dave > > == > David Walker > Library Web Services Manager > California State University > http://xerxes.calstate.edu > > From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Jonathan > Rochkind [rochk...@jhu.edu] > Sent: Thursday, October 21, 2010 1:26 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Help with DLF-ILS GetAvailability > > I don't think that's an abuse. I consider to be for > information about a "holdingset", or some collection of "items", while > is for information about an individual item. > > I think regardless of what you do you are being over-optimistic in > thinking that if you just "do dlf", your stuff will interchangeable with > any other clients or servers "doing dlf". The spec is way too open-ended > for that, it leaves a whole bunch of details not specified and up to the > implementer. For better or worse. I made more comments about this in > the blog post I referenced earlier. > > Jonathan > > Owen Stephens wrote: >> Thanks Dave, >> >> Yes - my reading was that dlf:holdings was for pure 'holdings' as opposed to >> 'availability'. We could put the simpleavailability in there I guess but as >> you say since we are controlling both ends then there doesn't seem any point >> in abusing it like that. The downside is we'd hoped to do something that >> could be taken by other sites - the original plan was to use the Juice >> framework - developed by Talis using jQuery to parse a standard availability >> format so that this could then be applied easily in other environments. >> Obviously we can still achieve the outcome we need for the immediate >> requirements of the project by using a custom format. >> >> Thanks again >> >> Owen >> >> >> On Thu, Oct 21, 2010 at 4:28 PM, Walker, David wrote: >> >> >>> Hey Owen, >>> >>> Seems like the you could use the element to hold this kind >>> of individual library information. >>> >>> The DLF-ILS documentation doesn't seem to think that you would use >>> dlf:simpleavailability here, though, but rather MARC or ISO holdings >>> schemas. >>> >>> But if you're controlling both ends of the communication, I don't know if >>> it really matters. >>> >>> --Dave >>> >>> == >>> David Walker >>> Library Web Services Manager >>> California State University >>> http://xerxes.calstate.edu >>> >>> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Owen >>> Stephens [o...@ostephens.com] >>> Sent: Wednesday, October 20, 2010 12:22 PM >>> To: CODE4LIB@LISTSERV.ND.EDU >>> Subject: [CODE4LIB] Help with DLF-ILS GetAvailability >>> >>> I'm working with the University of Oxford to look at integrating some >>> library services into their VLE/Learning Management System (Sakai). One of >>> the services is something that will give availability for items on a reading >>> list in the VLE (the Sakai 'Citation Helper'), and I'm looking at the >>> DLF-ILS GetAvailability specification to achieve this. >>> >>> For physical items, the availability information I was hoping to use is >>> expressed at the level of a physical collection. For example, if several >>> college libraries within the University I have aggregated information that >>> tells me the availability of the item in each of the college librar
[CODE4LIB] Open Edge - Open Source in Libraries event
Is there a better place to celebrate Burn's Night than Edinburgh? This could be just the excuse you were looking for... Open Edge - Open Source in Libraries This two day event on open source software for libraries is being run in collaboration with JISC and SCONUL. The first day is ’Haggis and Mash’, a Mashed Library event, while the second day covers broader issues, in particular how capacity might be built to enable open source solutions to flourish in HE and FE Libraries. Mashed Library (http://www.mashedlibrary.com/) is an informal network of Library professionals who are interested in how technology can be used to enhance library services increasing the ease of access to library data. ’Haggis and Mash' is a semi-unconference event which is designed to showcase some of best practice from library staff from around the UK, combined with a practical element to let delegates come together and brainstorm/develop practical solutions for mashing existing library data. Haggis and Mash will have a particular focus on the use of Open Source library software, including presentations and hands-on workshops covering systems such as Evergreen, VuFind and Blacklight, as well as other Open Source projects like Juice - for a full programme see http://mashedlibrary.com/wiki/index.php?title=Haggis_and_Mash This first day is intended for anyone with an interest in the use of technology in libraries, and although sessions will have technical content, the event aims to offer something to anyone with an interest in technology & libraries - from beginners to experienced programmers. The second day of the event has a broader focus for people with a strategic role in HE and FE Libraries and IT, as well as Managers and Practitioners. The day will cover four themes: THEME ONE: Why employ OSS library solutions ( the key issues) There are a number of reports on the overall benefits of OSS. This session will summarise and analyse the benefits and some challenges. THEME TWO What are the OSS solutions for libraries? (a) summary of what is available: inc vertical search, ERM, APIs, Widgets, IRs VLE, Digital preservation Look at some of the solutions in more detail with a focus on the benefits rather than details of features THEME THREE: What capacity do we need for OSS to flourish in libraries? THEME FOUR: How can we develop that capacity? For further information about Open Source Library Technology visit http://helibtech.com/Open+Source Hope some of you can make it Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
[CODE4LIB] Invitation to the Opening Data – Opening Doors Workshop, Manchester (UK), 18th April
*How can we gain bigger audiences for our scholarly and cultural resources and enhance services for researchers, teachers and learners?* In 2010, the JISC and RLUK Resource Discovery Taskforce (RDTF), involving national stakeholders from libraries, archives and museums, set out a vision for making the most of UK scholarly and cultural resources. JISC and their RDTF partners have now committed to a programme of activity to help fulfil the vision – building critical mass through opening up data, exploring and demonstrating what open data makes possible, and actively sharing learning points with the wider community. *The ‘Opening Data – Opening Doors’ event marks the starting point of this journey. * ** *We are looking for developers/tech-interested people to contribute to this event - tell us:* * * - * How you can use data describing these resources - what innovative services or products could be delivered? * - * What things can be done in terms of format/licensing/apis to make exploiting this data as easy as possible? * - * Do you have data you can contribute? Are there any barriers to contributing data (technical or other), and how could these be overcome? * - * What excites/would excite you about this attempt to open up scholarly/cultural resources and enhance services? * (you can get a flavour of what is already happening from this newsletter http://rdtf.mimas.ac.uk/newsletter/rdtfnewsletter01-march2011.pdf) Come to the event to: · Hear from services that are opening up their data including what’s happening in the new RDTF projects that have just been commissioned · Help to shape the messages, advice and support offered during the 2011 programme and beyond · Help to develop practical and engaging approaches to exploiting our data Venue: Malmaison Manchester, Piccadilly, Manchester, M1 1LZ Date: Monday 18th April 2011, 10.00am to 4.00pm *Who should attend?* Managers, practitioners, developers and advocates from libraries, archives, museums, associated publishers and interested organisations who want early involvement in clarifying, expanding and challenging the realities of exposing, sharing and exploiting the resource description data held by our institutions. *Register at:* http://rdtf-opening-doors.eventbrite.com/ There is already a lot happening. To find out more download a copy of the first RDTF newsletter at: http://rdtf.mimas.ac.uk/newsletter/rdtfnewsletter01-march2011.pdf For more information on the JISC and RLUK Resource Discovery Taskforce, visit: http://rdtf.mimas.ac.uk -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] MARC magic for file
"I'm sure any decent MARC tool can deal with them, since decent MARC tools are certainly going to be forgiving enough to deal with four characters that apparently don't even really matter." You say that, but I'm pretty sure Marc4J throws errors MARC records where these characters are incorrect Owen On Fri, Apr 1, 2011 at 3:51 AM, William Denton wrote: > On 28 March 2011, Ford, Kevin wrote: > > I couldn't get Simon's MARC 21 Magic file to work. Among other issues, I >> received "line too long" errors. But, since I've been curious about this >> for sometime, I figured I'd take a whack at it myself. Try this: >> > > This is very nice! Thanks. I tried it on a bunch of MARC files I have, > and it recognized almost all of them. A few it didn't, so I had a closer > look, and they're invalid. > > For example, the Internet Archive's Binghamton catalogue dump: > > http://ia600307.us.archive.org/6/items/marc_binghamton_univ/ > > $ file -m marc.magic bgm*mrc > bgm_openlib_final_0-5.mrc: data > bgm_openlib_final_10-15.mrc: MARC Bibliographic > bgm_openlib_final_15-18.mrc: data > bgm_openlib_final_5-10.mrc:MARC Bibliographic > > But why? Aha: > > $ head -c 25 bgm_openlib_final_*mrc > ==> bgm_openlib_final_0-5.mrc <== > 01812cas 2200457 45x00 > ==> bgm_openlib_final_10-15.mrc <== > 01008nam 2200289ua 45000 > ==> bgm_openlib_final_15-18.mrc <== > 01614cam00385 45 0 > ==> bgm_openlib_final_5-10.mrc <== > 00887nam 2200265v 45000 > > As you say, the leader should end with 4500 (as defined at > http://www.loc.gov/marc/authority/adleader.html) but two of those files > don't. So they're not valid MARC. I'm sure any decent MARC tool can deal > with them, since decent MARC tools are certainly going to be forgiving > enough to deal with four characters that apparently don't even really > matter. > > So on the one hand they're usable MARC but file wouldn't say so, and on the > other that's a good indication that the files have failed a basic validity > test. I wonder if there are similar situations for JPEGs or MP3s. > > I think you should definitely submit this for inclusion in the magic file. > It would be very useful for us all! > > Bill > > P.S. I'd never used head -c (to show a fixed number of bytes) before. > Always nice to find a new useful option to an old command. > > > # >> # MARC 21 Magic (Second cut) >> >> # Set at position 0 >> 0 short >0x >> >> # leader ends with 4500 >> >>> 20 string 4500 >>> >> >> # leader starts with 5 digits, followed by codes specific to MARC format >> >>> 0 regex/1 (^[0-9]{5})[acdnp][^bhlnqsu-z] MARC Bibliographic >>>> 0 regex/1 (^[0-9]{5})[acdnosx][z] MARC Authority >>>> 0 regex/1 (^[0-9]{5})[cdn][uvxy] MARC Holdings >>>> 0 regex/1 (^[0-9]{5})[acdn][w]MARC Classification >>>> 0 regex/1 (^[0-9]{5})[cdn][q] MARC Community >>>> >>> > > -- > William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
[CODE4LIB] LCSH and Linked Data
We are working on converting some MARC library records to RDF, and looking at how we handle links to LCSH (id.loc.gov) - and I'm looking for feedback on how we are proposing to do this... I'm not 100% confident about the approach, and to some extent I'm trying to work around the nature of how LCSH interacts with RDF at the moment I guess... but here goes - I would very much appreciate feedback/criticism/being told why what I'm proposing is wrong: I guess what I want to do is preserve aspects of the faceted nature of LCSH in a useful way, give useful links back to id.loc.gov where possible, and give access to a wide range of facets on which the data set could be queried. Because of this I'm proposing not just expressing the whole of the 650 field as a LCSH and checking for it's existence on id.loc.gov, but also checking for various combinations of topical term and subdivisions from the 650 field. So for any 650 field I'm proposing we should check on id.loc.govfor labels matching: check(650$$a) --> topical term check(650$$b) --> topical term check(650$$v) --> Form subdivision check(650$$x) --> General subdivision check(650$$y) --> Chronological subdivision check(650$$z) --> Geographic subdivision Then using whichever elements exist (all as topical terms): Check(650$$a--650$$b) Check(650$$a--650$$v) Check(650$$a--650$$x) Check(650$$a--650$$y) Check(650$$a--650$$z) Check(650$$a--650$$b--650$$v) Check(650$$a--650$$b--650$$x) Check(650$$a--650$$b--650$$y) Check(650$$a--650$$b--650$$z) Check(650$$a--650$$b--650$$x--650$$v) Check(650$$a--650$$b--650$$x--650$$y) Check(650$$a--650$$b--650$$x--650$$z) Check(650$$a--650$$b--650$$x--650$$z--650$$v) Check(650$$a--650$$b--650$$x--650$$z--650$$y) Check(650$$a--650$$b--650$$x--650$$z--650$$y--650$$v) As an example given: 650 00 $$aPopular music$$xHistory$$y20th century We would be checking id.loc.gov for 'Popular music' as a topical term (http://id.loc.gov/authorities/sh85088865) 'History' as a general subdivision (http://id.loc.gov/authorities/sh99005024 ) '20th century' as a chronological subdivision ( http://id.loc.gov/authorities/sh2002012476) 'Popular music--History and criticism' as a topical term ( http://id.loc.gov/authorities/sh2008109787) 'Popular music--20th century' as a topical term (not authorised) 'Popular music--History and criticism--20th century' as a topical term (not authorised) And expressing all matches in our RDF. My understanding of LCSH isn't what it might be - but the ordering of terms in the combined string checking is based on what I understand to be the usual order - is this correct, and should we be checking for alternative orderings? Thanks Owen -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
Thanks Tom - very helpful Perhaps this suggests that rather using an order we should check combinations while preserving the order of the original 650 field (I assume this should in theory be correct always - or at least done to the best of the cataloguers knowledge)? So for: 650 _0 $$a Education $$z England $$x Finance. check: Education England (subdiv) Finance (subdiv) Education--England Education--Finance Education--England--Finance While for 650 _0 $$a Education $$x Economic aspects $$z England we check Education Economic aspects (subdiv) England (subdiv) Education--Economic aspects Education--England Education--Economic aspects--England > > - It is possible for other orders in special circumstances, e.g. with > language dictionaries which can go something like: > > 650 _0 $$a English language $$v Dictionaries $$x Albanian. > This possiblity would also covered by preserving the order - check: English Language Dictionaries (subdiv) Albanian (subdiv) English Language--Dictionaries English Language--Albanian English Language--Dictionaries-Albanian Creating possibly invalid headings isn't necessarily a problem - as we won't get a match on id.loc.gov anyway. (Instinctively English Language--Albanian doesn't feel right) > > - Some of these are repeatable, so you can have too $$vs following each > other (e.g. Biography--Dictionaries); two $$zs (very common), as in > Education--England--London; two $xs (e.g. Biography--History and criticism). > > OK - that's fine, we can use each individually and in combination for any repeated headings I think > - I'm not I've ever come across a lot of $$bs in 650s. Do you have a lot of > them in the database? > > Hadn't checked until you asked! We have 1 in the dataset in question (c.30k records) :) > I'm not sure how possible it would be to come up with a definitive list of > (reasonable) possible combinations. > > You are probably right - but I'm not too bothered about aiming at 'definitive' at this stage anyway - but I do want to get something relatively functional/useful > Tom > > Thomas Meehan > Head of Current Cataloguing > University College London Library Services > > Owen Stephens wrote: > >> We are working on converting some MARC library records to RDF, and looking >> at how we handle links to LCSH (id.loc.gov <http://id.loc.gov>) - and I'm >> looking for feedback on how we are proposing to do this... >> >> >> I'm not 100% confident about the approach, and to some extent I'm trying >> to work around the nature of how LCSH interacts with RDF at the moment I >> guess... but here goes - I would very much appreciate >> feedback/criticism/being told why what I'm proposing is wrong: >> >> I guess what I want to do is preserve aspects of the faceted nature of >> LCSH in a useful way, give useful links back to id.loc.gov < >> http://id.loc.gov> where possible, and give access to a wide range of >> facets on which the data set could be queried. Because of this I'm proposing >> not just expressing the whole of the 650 field as a LCSH and checking for >> it's existence on id.loc.gov <http://id.loc.gov>, but also checking for >> various combinations of topical term and subdivisions from the 650 field. So >> for any 650 field I'm proposing we should check on id.loc.gov < >> http://id.loc.gov> for labels matching: >> >> >> check(650$$a) --> topical term >> check(650$$b) --> topical term >> check(650$$v) --> Form subdivision >> check(650$$x) --> General subdivision >> check(650$$y) --> Chronological subdivision >> check(650$$z) --> Geographic subdivision >> >> Then using whichever elements exist (all as topical terms): >> Check(650$$a--650$$b) >> Check(650$$a--650$$v) >> Check(650$$a--650$$x) >> Check(650$$a--650$$y) >> Check(650$$a--650$$z) >> Check(650$$a--650$$b--650$$v) >> Check(650$$a--650$$b--650$$x) >> Check(650$$a--650$$b--650$$y) >> Check(650$$a--650$$b--650$$z) >> Check(650$$a--650$$b--650$$x--650$$v) >> Check(650$$a--650$$b--650$$x--650$$y) >> Check(650$$a--650$$b--650$$x--650$$z) >> Check(650$$a--650$$b--650$$x--650$$z--650$$v) >> Check(650$$a--650$$b--650$$x--650$$z--650$$y) >> Check(650$$a--650$$b--650$$x--650$$z--650$$y--650$$v) >> >> >> As an example given: >> >> 650 00 $$aPopular music$$xHistory$$y20th century >> >> We would be checking id.loc.gov <http://id.loc.gov> for >> >> >> 'Popular music' as a topical term ( >> http://id.loc.gov/authorities/sh85088865) >> 'History' as
Re: [CODE4LIB] LCSH and Linked Data
Still digesting Andrew's response (thanks Andrew), but On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso wrote: > *Currently under id.loc.gov you will not find name authority records, but > you can find them at viaf.org*. > *[YZ]* viaf.org does not include geographic names. I just checked there > England. > Is this not the relevant VIAF entry http://viaf.org/viaf/14299580<http://viaf.org/viaf/142995804> -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
I'm out of my depth here :) But... this is what I understood Andrew to be saying. In this instance (?because 'England' is a Name Authority?) rather than create a separate LCSH authority record for 'England' (as the 151), rather the LCSH subdivision is recorded in the 781 of the existing Name Authority record. Searching on http://authorities.loc.gov for England, I find an Authorised heading, marked as a LCSH - but when I go to that record what I get is the name authority record n 82068148 - the name authority record as represented on VIAF by http://viaf.org/viaf/142995804/ (which links to http://errol.oclc.org/laf/n%20%2082068148.html) Just as this is getting interesting time differences mean I'm about to head home :) Owen On Thu, Apr 7, 2011 at 4:34 PM, LeVan,Ralph wrote: > If you look at the fields those names come from, I think they mean > England as a corporation, not England as a place. > > Ralph > > > -Original Message- > > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf > Of > > Owen Stephens > > Sent: Thursday, April 07, 2011 11:28 AM > > To: CODE4LIB@LISTSERV.ND.EDU > > Subject: Re: [CODE4LIB] LCSH and Linked Data > > > > Still digesting Andrew's response (thanks Andrew), but > > > > On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso > wrote: > > > > > *Currently under id.loc.gov you will not find name authority > records, but > > > you can find them at viaf.org*. > > > *[YZ]* viaf.org does not include geographic names. I just checked > there > > > England. > > > > > > > Is this not the relevant VIAF entry > > http://viaf.org/viaf/14299580<http://viaf.org/viaf/142995804> > > > > > > -- > > Owen Stephens > > Owen Stephens Consulting > > Web: http://www.ostephens.com > > Email: o...@ostephens.com > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] LCSH and Linked Data
Thanks for all the information and discussion. I don't think I'm familiar enough with Authority file formats to completely comprehend - but I certainly understand the issues around the question of 'place' vs 'histo-geo-poltical entity'. Some of this makes me worry about the immediate applicability of the LC Authority files in the Linked Data space - someone said to me recently 'SKOS is just a way of avoiding dealing with the real semantics' :) Anyway - putting that to one side, the simplest approach for me at the moment seems to only look at authorised LCSH as represented on id.loc.gov. Picking up on Andy's first response: On Thu, Apr 7, 2011 at 3:46 PM, Houghton,Andrew wrote: > After having done numerous matching and mapping projects, there are some > issues that you will face with your strategy, assuming I understand it > correctly. Trying to match a heading starting at the left most subfield and > working forward will not necessarily produce correct results when matching > against the LCSH authority file. Using your example: > > > > 650 _0 $a Education $z England $x Finance > > > > is a good example of why processing the heading starting at the left will > not necessarily produce the correct results. Assuming I understand your > proposal you would first search for: > > > > 150 __ $a Education > > > > and find the heading with LCCN sh85040989. Next you would look for: > > > > 181 __ $z England > > > > and you would NOT find this heading in LCSH. > OK - ignoring the question of where the best place to look for this is - I can live with not matching it for now. Later (perhaps when I understand it better, or when these headings are added to id.loc.gov we can revisit this) > The second issue using your example is that you want to find the “longest” > matching heading. While the pieces parts are there, so is the enumerated > authority heading: > > > > 150 __ $a Education $z England > > > > as LCCN sh2008102746. So your heading is actually composed of the > enumerated headings: > > > > sh2008102746150 __ $a Education $z England > > sh2002007885180 __ $x Finance > > > > and not the separate headings: > > > > sh85040989 150 __ $a Education > > n82068148 150 __ $a England > > sh2002007885180 __ $x Finance > > > > Although one could argue that either analysis is correct depending upon > what you are trying to accomplish. > > > What I'm interested in is representing the data as RDF/Linked Data in a way that opens up the best opportunities for both understanding and querying the data. Unfortunately at the moment there isn't a good way of representing LCSH directly in RDF (the MADS work may help I guess but to be honest at the moment I see that as overly complex - but that's another discussion). What I can do is make statements that an item is 'about' a subject (probably using dc:subject) and then point at an id.loc.gov URI. However, if I only express individual headings: Education England (natch) Finance Then obviously I lose the context of the full heading - so I also want to look for Education--England--Finance (which I won't find on id.loc.gov as not authorised) At this point I could stop, but my feeling is that it is useful to also look for other combinations of the terms: Education--England (not authorised) Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008) My theory is that as long as I stick to combinations that start with a topical term I'm not going to make startlingly inaccurate statements? > The matching algorithm I have used in the past contains two routines. The > first f(a) will accept a heading as a parameter, scrub the heading, e.g., > remove unnecessary subfield like $0, $3, $6, $8, etc. and do any other > pre-processing necessary on the heading, then call the second function f(b). > The f(b) function accepts a heading as a parameter and recursively calls > itself until it builds up the list LCCNs that comprise the heading. It first > looks for the given heading when it doesn’t find it, it removes the **last > ** subfield and recursively calls itself, otherwise it appends the found > LCCN to the returned list and exits. This strategy will find the longest > match. > Unless I've misunderstood this, this strategy would not find 'Education--Finance'? Instead I need to remove each *subdivision* in turn (no matter where it appears in the heading order) and try all possible combinations checking each for a match on id.loc.gov. Again, I can do this without worrying about possible invalid headings, as these wouldn't have been authorised anyway... I can check the number of variations around this but I guess that in my limited set of records (only 30k) there will be a relatively small number of possible patterns to check. Does that make sense?
Re: [CODE4LIB] LCSH and Linked Data
Thanks Ross - I have been pushing some cataloguing folk to comment on some of this as well (and have some feedback) - but I take the point that wider consultation via autocat could be a good idea. (for some reason this makes me slightly nervous!)s In terms of whether Education--England--Finance is authorised or not - I think I took from Andy's response that it wasn't, but also looking at it on authorities.loc.gov it isn't marked as 'authorised'. Anyway - the relevant thing for me at this stage is that I won't find a match via id.loc.gov - so I can't get a URI for it anyway. There are clearly quite a few issues with interacting with LCSH as Linked Data at the moment - I'm not that keen on how this currently works, and my reaction to the MADS/RDF ontology is similar to that of Bruce D'Arcus (see http://metadata.posterous.com/lcs-madsrdf-ontology-and-the-future-of-the-se), but on the otherhand I want to embrace the opportunity to start joining some stuff up and seeing what happens :) Owen On Fri, Apr 8, 2011 at 3:10 PM, Ross Singer wrote: > On Fri, Apr 8, 2011 at 5:02 AM, Owen Stephens wrote: > > > Then obviously I lose the context of the full heading - so I also want to > > look for > > Education--England--Finance (which I won't find on id.loc.gov as not > > authorised) > > > > At this point I could stop, but my feeling is that it is useful to also > look > > for other combinations of the terms: > > > > Education--England (not authorised) > > Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008 > ) > > > > My theory is that as long as I stick to combinations that start with a > > topical term I'm not going to make startlingly inaccurate statements? > > I would definitely ask this question somewhere other than Code4lib > (autocat, maybe?), since I think the answer is more complicated than > this (although they could validate/invalidate your assumption about > whether or not this approach would get you "close enough"). > > My understanding is that Education--England--Finance *is* authorized, > because Education--Finance is and England is a free-floating > geographic subdivision. Because it's also an authorized heading, > "Education--England--Finance" is, in fact, an authority. The problem > is that free-floating subdivisions cause an almost infinite number of > permutations, so there aren't LCCNs issued for them. > > This is where things get super-wonky. It's also the reason I > initially created lcsubjects.org, specifically to give these (and, > ideally, locally controlled subject headings) a publishing > platform/centralized repository, but it quickly grew to be more than > "just a side project". There were issues of how the data would be > constructed (esp. since, at the time, I had no access to the NAF), how > to reconcile changes, provenance, etc. Add to the fact that 2 years > ago, there wasn't much linked library data going on, it was really > hard to justify the effort. > > But, yeah, it would be worth running your ideas by a few catalogers to > see what they think. > > -Ross. > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] [dpla-discussion] Rethinking the "library" part of DPLA
I guess that people may already be familiar with the Candide 2.0 project at NYPL http://candide.nypl.org/text/ - this sounds not dissimilar to the type of approach being suggested This document is built using Wordpress with the Digress.it plugin (http://digress.it/) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 10 Apr 2011, at 17:35, Nate Hill wrote: > Eric, thanks for finding enough merit in my post on the DPLA listserv > to repost it here. > > Karen and Peter, I completely agree with your feelings- > But my point in throwing this idea out there was that despite all of > the copyright issues, we don't really do a great job making a simple, > intuitive, branded interface for the works that *are* available - the > public domain stuff. Instead we seem to be content with knowing that > this content is out there, and letting vendors add it to their > difficult-to-use interfaces. > > I guess my hope, seeing this reposted here is that someone might have > a suggestion as to why I would not host public domain ebooks on my own > library's site. Are there technical hurdles to consider? > > I feel like I see a tiny little piece of the ebook access problem that > we *can* solve here, while some of the larger issues will indeed be > debated in forums like the DPLA for quite a while. By solving a small > problem along the way, perhaps when the giant 1923-2011 problem is > resolved we'll have a clearer path as to what type of access we might > provide. > > > On 4/10/11, Peter Murray wrote: >> I, too, have been struggling with this aspect of the discussion. (I'm on the >> DPLA list as well.) There seems to be this blind spot within the leadership >> of the group to ignore the copyright problem and any interaction with >> publishers of popular materials. One of the great hopes that I have for this >> group, with all of the publicity it is generating, is to serve as a voice >> and a focal point to bring authors, publishers and librarians together to >> talk about a new digital ownership and sharing model. >> >> That doesn't seem to be happening. >> >> >> Peter >> >> On Apr 10, 2011, at 10:05, "Karen Coyle" wrote: >> >>> I appreciate the spirit of this, but despair at the idea that >>> libraries organize their services around public domain works, thus >>> becoming early 20th century institutions. The gap between 1923 and >>> 2011 is huge, and it makes no sense to users that a library provide >>> services based on publication date, much less that enhanced services >>> stop at 1923. >>> >>> kc >>> >>> Quoting Eric Hellman : >>> >>>> The DPLA listserv is probably too impractical for most of Code4Lib, >>>> but Nate Hill (who's on this list as well) made this contribution >>>> there, which I think deserves attention from library coders here. >>>> >>>> On Apr 5, 2011, at 11:15 AM, Nate Hill wrote: >>>> >>>>> It is awesome that the project Gutenberg stuff is out there, it is >>>>> a great start. But libraries aren't using it right. There's been >>>>> talk on this list about the changing role of the public library in >>>>> people's lives, there's been talk about the library brand, and some >>>>> talk about what 'local' might mean in this context. I'd suggest >>>>> that we should find ways to make reading library ebooks feel local >>>>> and connected to an immediate community. Brick and mortar library >>>>> facilities are public spaces, and librarians are proud of that. We >>>>> have collections of materials in there, and we host programs and >>>>> events to give those materials context within the community. >>>>> There's something special about watching a child find a good book, >>>>> and then show it to his or her friend and talk about how awesome >>>>> it is. There's also something special about watching a senior >>>>> citizens book group get together and discuss a new novel every >>>>> month. For some reason, libraries really struggle with treating >>>>> their digital spaces the same way. >>>>> >>>>> I'd love to see libraries creating online conversations around >>>>> ebooks in much the same way. Take a title from project Gutenberg: >>>>> The Adventures of Huckleberry Finn. Why not host that bo
Re: [CODE4LIB] RDF for opening times/hours?
I'd suggest having a look at the Goid Relations ontology http://wiki.goodrelations-vocabulary.org/Quickstart - it's aimed at businesses but the OpeningHours specification might do what you need http://www.heppnetz.de/ontologies/goodrelations/v1.html#OpeningHoursSpecification While handling public holidays etc is not immediately obvious it is covered in this mail http://ebusiness-unibw.org/pipermail/goodrelations/2010-October/000261.html Picking up on the previous comment Good Relations in RDFa is one of the formats Google use for Rich Snippets and it is also picked up by Yahoo Owen On 7 Jun 2011, at 23:05, Tom Keays wrote: > There was a time, about 5 years ago, when I assumed that microformats > were the way to go and spent a bit of time looking at hCalendar for > representing iCalendar-formatted event information. > > http://microformats.org/wiki/hcalendar > > Not long after that, there was a lot of talk about RDF and RDFa for > this same purpose. Now I was confused as to whether to change my > strategy or not, but RDF Calendar seemed to be a good idea. The latter > also was nice because it could be used to syndicate event information > via RSS. > > http://pemberton-vandf.blogspot.com/2008/06/how-to-do-hcalendar-in-rdfa.html > http://www.w3.org/TR/rdfcal/ > > These days it seems to be all about HTML5 microdata, especially > because of Rich Snippets and Google's support for this approach. > > http://html5doctor.com/microdata/#microdata-action > > All three approaches allow you to embed iCalendar formatted event > information on a web page. All three of them do it differently. I'm > even more confused now than I was 5 years ago. This should not be this > hard, yet there is still no definitive way to deploy this information > and preserve the semantics of the event information. Part of this may > be because the iCalendar format, although widely used, is itself > insufficient. > > Tom
[CODE4LIB] PDF->text extraction
The CORE project at The Open University in the UK is doing some work on finding similarity between papers in institutional repositories (see http://core-project.kmi.open.ac.uk/ for more info). The first step in the process is extracting text from the (mainly) pdf documents harvested from repositories We've tried iText but had issues with quality We moved to PDFBox but are having performance issues Any other suggestions/experience? Thanks, Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
Re: [CODE4LIB] PDF->text extraction
Thanks to all for the info and suggestions - we'll have a look at them. Via another route I've had http://snowtide.com/PDFTextStream recommended (commercial, but looks like they are generally open to offering academic licenses for free at least for a limited period) - anyone tried that? Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 22 Jun 2011, at 03:43, Bill Janssen wrote: > Simon Spero wrote: > >> Another option is to use the ABBYY FineReader >> SDK<http://www.abbyy.com/ocr_sdk_linux/overview/>. >> Annoyingly, the linux version is one release behind the windows SDK (which >> has improved support for multi core processing of single document). Since >> Owen's problem is embarrassingly parallel, multi-core tuning isn't as >> useful as being able to run on a local cluster or regional grid. ABBYY >> software tends to be a little pricey, but the results are usually very good. > > If you're going to OCR, Nuance OmniPage is also very good, and I believe > costs about the same as FineReader. We also use tOCR, from Transym, > which is Windows-only, but very accurate and cheap. I have yet to see > decent results on complicated pages (technical papers) from either > OCRopus or Tesseract with the default models that they come with; I > believe they're both still aimed at book page OCR. > > Bill
[CODE4LIB] Developer Competition using Library/Archive/Museum data
Celebrate Liberation – A worldwide competition for open software developers & open data UK Discovery (http://discovery.ac.uk/) and the Developer Community Supporting Innovation (DevCSI) project based at UKOLN are running a global Developer Competition throughout July 2011 to build open source software applications / tools, using at least one of our 10 open data sources collected from libraries, museums and archives. Enter simply by blogging about your application and emailing the blog post URI to joy.pal...@manchester.ac.uk by the deadline of 2359 (your local time) on Monday 1 August 2011. Full details of the competition, the data sets and how to enter are at http://discovery.ac.uk/developers/competition/ There are 13 prizes including Best entry for each dataset – there are 10 datasets so there could be 10 winners of £30 Amazon vouchers and an aggregation could win more than one! Data Munging – Best example of Consolidating or Aggregating or De-duplicating or Entity matching or … one prize of £100 Amazon voucher. Overall winners – An EEE Pad Transformer for the overall winner and a £200 Amazon voucher for the Runner Up. And you can win more than once :) Specific competition tag on twitter is #discodev, but #devcsi and #ukdiscovery also good to follow/use Excited to see what people come up with - hope some of you are able to enter Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
[CODE4LIB] Show reuse of library/archive/museum data and win prizes
the ways in which you have used this data so we can understand more fully the benefits of sharing it and improve our services. Please contact metad...@bl.uk if you wish to share your experiences with us and those that are using this service. Give Credit Where Credit is Due: The British Library has a responsibility to maintain its bibliographic data on the nation’s behalf. Please credit all use of this data to the British Library and link back to www.bl.uk/bibliographic/datafree.html in order that this information can be shared and developed with today’s Internet users as well as future generations. Duplicate of package:bluk-bnb Tyne and Wear Museums Collections (Imagine) Part of the Europeana Linked Open Data, this is a collection of metadata describing (and linking to digital copies where appropriate) items in the Tyne and Wear Museums Collections. Cambridge University Library dataset #1 This data marks the first major out put of the COMET project. COMET is a JISC funded collaboration between Cambridge University Library and CARET, University of Cambridge. It is funded under the JISC Infrastructure for Resource Discovery programme. It represents work over a 20+ year period which contains a number of changes in practices and cataloguing tools. No attempt has been made to screen for quaility of records other than the Voyager export process. This data also includes the 180,000 'Tower Project' records published under the JISC Open Bibliography Project. JISC MOSAIC Activity Data The JISC MOSAIC (www.sero.co.uk/jisc-mosaic.html) project gathered together data covering user activity in a few UK Higher Education libraries. The data is available for download and via an API and contains information on books borrowed during specific time periods, and where available describes links between books, courses, and year of study. OpenURL Router Data (EDINA) EDINA is making the OpenURL Router Data available from April 2011. It is derived from the logs of the OpenURL Router, which directs user requests for academic papers to the appropriate institutional resolver. It enables institutions to register their resolver once only, at [http://openurl.ac.uk](http://openurl.ac.uk "OpenURL Router"), and service providers may then use openurl.ac.uk as the “base URL” for OpenURL links for UK HE and FE customers. This is the product of JISC-funded project activity, and provides a unique data set. The data captured varies from request to request since different users enter different information into requests. Further information on the details of the data set, sample files and the data itself is available at [http://openurl.ac.uk/doc/data/data.html](http://openurl.ac.uk/doc/data/data.html "OpenURL Router Data"). The team would like to thank all the institutions involved in this initiative for their participation. The data are made available under the Open Data Commons (ODC) Public Domain Dedication and Licence and the ODC Attribution Sharealike Community Norms. Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
Re: [CODE4LIB] CIS students, service learning, and the library
I was going to point to that too, and also note that the DevXS event was the brainchild of two students at the University of Lincoln, who went onto work at the University - including developing 'Jerome' a library search interface using MongoDB and the Sphinx index/search s/w http://jerome.library.lincoln.ac.uk/ Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 13 Oct 2011, at 23:04, Robert Robertson wrote: > Hi Ellen, > > The event hasn't been held yet but it might be worth taking a look at what > DevCSI are doing with their DevXS event http://devxs.org/ and seeing what > comes out of it after the fact. > > The DevCSI initiative (http://devcsi.ukoln.ac.uk/blog/) has run quite a few > hackday events (inlcuding dev8D ) as part of an effort to build a stronger > community of developers in HE in the UK and some of their events and > challenges have been around library data. > > DevXS is their first major foray into trying the same idea with CS and other > students but it might offer some ideas for events that could raise interest > in longer term service learning projects or tackle specific tasks. > > cheers, > John > > > R. John Robertson > skype: rjohnrobertson > Research Fellow/ Open Education Resources programme support officer (JISC > CETIS), > Centre for Academic Practice and Learning Enhancement > University of Strathclyde > Tel:+44 (0) 141 548 3072 > http://blogs.cetis.ac.uk/johnr/ > The University of Strathclyde is a charitable body, registered in Scotland, > with registration number SC015263 > > From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ellen K. > Wilson [ewil...@jaguar1.usouthal.edu] > Sent: Thursday, October 13, 2011 9:29 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: [CODE4LIB] CIS students, service learning, and the library > > I am wondering if anyone has experience working with students > (particularly CIS students) in service learning projects involving the > library. I am currently supervising four first-year students who are > working on a brief (10 hour) project involving the usability and > redesign of the homepage as part of a first year seminar course. > Obviously we won't get the whole thing done, but it is providing us with > some valuable student insight into what should be on the page, etc. > > I anticipate the CIS department's first-year experience program will > want to continue this collaboration, so I'm trying to brainstorm some > projects that might be useful for future semesters particularly for > freshmen who are just beginning their course of study in computer > science, information technology, or information systems. This semester's > project was thrown together in only a few days and I would like to not > do that again! Ideas would be appreciated. > > Best regards, > > Ellen > > -- > Ellen Knowlton Wilson > Instructional Services Librarian > Room 250, University Library > University of South Alabama > 5901 USA Drive North > Mobile, AL 36688 > (251) 460-6045 > ewil...@jaguar1.usouthal.edu
[CODE4LIB] Mobile technologies in libraries - fact finding survey
The m-libraries support project (http://www.m-libraries.info/) is part of JISC’s Mobile Infrastructure for Libraries programme (http://infteam.jiscinvolve.org/wp/2011/10/11/mobile-infrastructure-for-libraries-new-projects/) running from November 2011 until September 2012. The project aims to build a collection of useful resources and case studies based on current developments using mobile technologies in libraries, and to foster a community for those working in the m-library area or interested in learning more. A brief introductory survey has been devised to help inform the project - as a way of starting to gather information, to discover what information is needed to help libraries decide on a way forward, and to begin to understand what an m-libraries community could offer to help. The survey should only take 5-10 minutes and all questions are optional. This is an open survey - please pass the survey link on to anyone else you think might be interested via email or social media: http://svy.mk/mlibs1 If you’re interested in mobile technologies in libraries and would like to receive updates about the project, please visit our project blog at http://m-libraries.info and subscribe to updates (links in the right hand side for RSS or email subscriptions). Thanks and best wishes, Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936
Re: [CODE4LIB] Models of MARC in RDF
It would be great to start collecting transforms together - just a quick brain dump of some I'm aware of MARC21 transformations Cambridge University Library - http://data.lib.cam.ac.uk - transformation made available (in code) from same site Open University - http://data.open.ac.uk - specific transform for materials related to teaching, code available at http://code.google.com/p/luceroproject/source/browse/trunk%20luceroproject/OULinkedData/src/uk/ac/open/kmi/lucero/rdfextractor/RDFExtractor.java (MARC transform is in libraryRDFExtraction method) COPAC - small set of records from the COPAC Union catalogue - data and transform not yet published Podes Projekt - LinkedAuthors - documentation at http://bibpode.no/linkedauthors/doc/Pode-LinkedAuthors-Documentation.pdf - 2 stage transformation firstly from MARC to FRBRized version of data, then from FRBRized data to RDF. These linked from documentation Podes Project - LinkedNonFiction - documentation at http://bibpode.no/linkednonfiction/doc/Pode-LinkedNonFiction-Documentation.pdf - MARC data transformed using xslt https://github.com/pode/LinkedNonFiction/blob/master/marcslim2n3.xsl British Library British National Bibliography - http://www.bl.uk/bibliographic/datafree.html - data model documented, but no code available Libris.se - some notes in various presentations/blogposts (e.g. http://dc2008.de/wp-content/uploads/2008/09/malmsten.pdf) but can't find explicit transformation Hungarian National library - http://thedatahub.org/dataset/hungarian-national-library-catalog and http://nektar.oszk.hu/wiki/Semantic_web#Implementation - some information on ontologies used but no code or explicit transformation (not 100% sure this is from MARC) Talis - implemented in several live catalogues including http://catalogue.library.manchester.ac.uk/ - no documentation or code afaik although some notes in MAB transformation HBZ - some of the transformation documented at https://wiki1.hbz-nrw.de/display/SEM/Converting+the+Open+Data+from+the+hbz+to+BIBO, don't think any code published? Would be really helpful if more projects published their transformations (or someone told me where to look!) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 26 Nov 2011, at 15:58, Karen Coyle wrote: > A few of the code4lib talk proposals mention projects that have or will > transform MARC records into RDF. If any of you have documentation and/or > examples of this, I would be very interested to see them, even if they are > "under construction." > > Thanks, > kc > > -- > Karen Coyle > kco...@kcoyle.net http://kcoyle.net > ph: 1-510-540-7596 > m: 1-510-435-8234 > skype: kcoylenet
Re: [CODE4LIB] Models of MARC in RDF
Hi Esme - thanks for this. Do you have any documentation on which predicates you've used and MODS->RDF transformation? Owen On 2 Dec 2011, at 16:07, Esme Cowles wrote: > Owen- > > Another strategy for capturing MARC data in RDF is to convert it to MODS (we > do this using the LoC MARC to MODS stylesheet: > http://www.loc.gov/standards/marcxml/xslt/MARC21slim2MODS.xsl). From there, > it's pretty easy to incorporate into RDF. There are some issues to be aware > of, such as how to map the MODS XML names to predicates and how to handle > elements that can appear in multiple places in the hierarchy. > > -Esme > -- > Esme Cowles > > "Necessity is the plea for every infringement of human freedom. It is the > argument of tyrants; it is the creed of slaves." -- William Pitt, 1783 > > On 11/28/2011, at 8:25 AM, Owen Stephens wrote: > >> It would be great to start collecting transforms together - just a quick >> brain dump of some I'm aware of >> >> MARC21 transformations >> Cambridge University Library - http://data.lib.cam.ac.uk - transformation >> made available (in code) from same site >> Open University - http://data.open.ac.uk - specific transform for materials >> related to teaching, code available at >> http://code.google.com/p/luceroproject/source/browse/trunk%20luceroproject/OULinkedData/src/uk/ac/open/kmi/lucero/rdfextractor/RDFExtractor.java >> (MARC transform is in libraryRDFExtraction method) >> COPAC - small set of records from the COPAC Union catalogue - data and >> transform not yet published >> Podes Projekt - LinkedAuthors - documentation at >> http://bibpode.no/linkedauthors/doc/Pode-LinkedAuthors-Documentation.pdf - 2 >> stage transformation firstly from MARC to FRBRized version of data, then >> from FRBRized data to RDF. These linked from documentation >> Podes Project - LinkedNonFiction - documentation at >> http://bibpode.no/linkednonfiction/doc/Pode-LinkedNonFiction-Documentation.pdf >> - MARC data transformed using xslt >> https://github.com/pode/LinkedNonFiction/blob/master/marcslim2n3.xsl >> >> British Library British National Bibliography - >> http://www.bl.uk/bibliographic/datafree.html - data model documented, but no >> code available >> Libris.se - some notes in various presentations/blogposts (e.g. >> http://dc2008.de/wp-content/uploads/2008/09/malmsten.pdf) but can't find >> explicit transformation >> Hungarian National library - >> http://thedatahub.org/dataset/hungarian-national-library-catalog and >> http://nektar.oszk.hu/wiki/Semantic_web#Implementation - some information on >> ontologies used but no code or explicit transformation (not 100% sure this >> is from MARC) >> Talis - implemented in several live catalogues including >> http://catalogue.library.manchester.ac.uk/ - no documentation or code afaik >> although some notes in >> >> MAB transformation >> HBZ - some of the transformation documented at >> https://wiki1.hbz-nrw.de/display/SEM/Converting+the+Open+Data+from+the+hbz+to+BIBO, >> don't think any code published? >> >> Would be really helpful if more projects published their transformations (or >> someone told me where to look!) >> >> Owen >> >> Owen Stephens >> Owen Stephens Consulting >> Web: http://www.ostephens.com >> Email: o...@ostephens.com >> Telephone: 0121 288 6936 >> >> On 26 Nov 2011, at 15:58, Karen Coyle wrote: >> >>> A few of the code4lib talk proposals mention projects that have or will >>> transform MARC records into RDF. If any of you have documentation and/or >>> examples of this, I would be very interested to see them, even if they are >>> "under construction." >>> >>> Thanks, >>> kc >>> >>> -- >>> Karen Coyle >>> kco...@kcoyle.net http://kcoyle.net >>> ph: 1-510-540-7596 >>> m: 1-510-435-8234 >>> skype: kcoylenet
Re: [CODE4LIB] Models of MARC in RDF
Oh - and perhaps just/more importantly - how do you create URIs for you data and how do you reconcile against other sources? Owen On 2 Dec 2011, at 16:07, Esme Cowles wrote: > Owen- > > Another strategy for capturing MARC data in RDF is to convert it to MODS (we > do this using the LoC MARC to MODS stylesheet: > http://www.loc.gov/standards/marcxml/xslt/MARC21slim2MODS.xsl). From there, > it's pretty easy to incorporate into RDF. There are some issues to be aware > of, such as how to map the MODS XML names to predicates and how to handle > elements that can appear in multiple places in the hierarchy. > > -Esme > -- > Esme Cowles > > "Necessity is the plea for every infringement of human freedom. It is the > argument of tyrants; it is the creed of slaves." -- William Pitt, 1783 > > On 11/28/2011, at 8:25 AM, Owen Stephens wrote: > >> It would be great to start collecting transforms together - just a quick >> brain dump of some I'm aware of >> >> MARC21 transformations >> Cambridge University Library - http://data.lib.cam.ac.uk - transformation >> made available (in code) from same site >> Open University - http://data.open.ac.uk - specific transform for materials >> related to teaching, code available at >> http://code.google.com/p/luceroproject/source/browse/trunk%20luceroproject/OULinkedData/src/uk/ac/open/kmi/lucero/rdfextractor/RDFExtractor.java >> (MARC transform is in libraryRDFExtraction method) >> COPAC - small set of records from the COPAC Union catalogue - data and >> transform not yet published >> Podes Projekt - LinkedAuthors - documentation at >> http://bibpode.no/linkedauthors/doc/Pode-LinkedAuthors-Documentation.pdf - 2 >> stage transformation firstly from MARC to FRBRized version of data, then >> from FRBRized data to RDF. These linked from documentation >> Podes Project - LinkedNonFiction - documentation at >> http://bibpode.no/linkednonfiction/doc/Pode-LinkedNonFiction-Documentation.pdf >> - MARC data transformed using xslt >> https://github.com/pode/LinkedNonFiction/blob/master/marcslim2n3.xsl >> >> British Library British National Bibliography - >> http://www.bl.uk/bibliographic/datafree.html - data model documented, but no >> code available >> Libris.se - some notes in various presentations/blogposts (e.g. >> http://dc2008.de/wp-content/uploads/2008/09/malmsten.pdf) but can't find >> explicit transformation >> Hungarian National library - >> http://thedatahub.org/dataset/hungarian-national-library-catalog and >> http://nektar.oszk.hu/wiki/Semantic_web#Implementation - some information on >> ontologies used but no code or explicit transformation (not 100% sure this >> is from MARC) >> Talis - implemented in several live catalogues including >> http://catalogue.library.manchester.ac.uk/ - no documentation or code afaik >> although some notes in >> >> MAB transformation >> HBZ - some of the transformation documented at >> https://wiki1.hbz-nrw.de/display/SEM/Converting+the+Open+Data+from+the+hbz+to+BIBO, >> don't think any code published? >> >> Would be really helpful if more projects published their transformations (or >> someone told me where to look!) >> >> Owen >> >> Owen Stephens >> Owen Stephens Consulting >> Web: http://www.ostephens.com >> Email: o...@ostephens.com >> Telephone: 0121 288 6936 >> >> On 26 Nov 2011, at 15:58, Karen Coyle wrote: >> >>> A few of the code4lib talk proposals mention projects that have or will >>> transform MARC records into RDF. If any of you have documentation and/or >>> examples of this, I would be very interested to see them, even if they are >>> "under construction." >>> >>> Thanks, >>> kc >>> >>> -- >>> Karen Coyle >>> kco...@kcoyle.net http://kcoyle.net >>> ph: 1-510-540-7596 >>> m: 1-510-435-8234 >>> skype: kcoylenet
Re: [CODE4LIB] Models of MARC in RDF
I'd suggest that rather than shove it in a triple it might be better to point at alternative representations, including MARC if desirable (keep meaning to blog some thoughts about progressively enhanced metadata...) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 6 Dec 2011, at 15:44, Karen Coyle wrote: > Quoting "Fleming, Declan" : > >> Hi - I'll note that the mapping decisions were made by our metadata services >> (then Cataloging) group, not by the tech folks making it all work, though we >> were all involved in the discussions. One idea that came up was to do a, >> perhaps, lossy translation, but also stuff one triple with a text dump of >> the whole MARC record just in case we needed to grab some other element out >> we might need. We didn't do that, but I still like the idea. Ok, it was my >> idea. ;) > > I like that idea! Now that "disk space" is no longer an issue, it makes good > sense to keep around the "original state" of any data that you transform, > just in case you change your mind. I hadn't thought about incorporating the > entire MARC record string in the transformation, but as I recall the average > size of a MARC record is somewhere around 1K, which really isn't all that > much by today's standards. > > (As an old-timer, I remember running the entire Univ. of California union > catalog on 35 megabytes, something that would now be considered a smallish > email attachment.) > > kc > >> >> D >> >> -Original Message- >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Esme >> Cowles >> Sent: Monday, December 05, 2011 11:22 AM >> To: CODE4LIB@LISTSERV.ND.EDU >> Subject: Re: [CODE4LIB] Models of MARC in RDF >> >> I looked into this a little more closely, and it turns out it's a little >> more complicated than I remembered. We built support for transforming to >> MODS using the MODS21slim2MODS.xsl stylesheet, but don't use that. Instead, >> we use custom Java code to do the mapping. >> >> I don't have a lot of public examples, but there's at least one public >> object which you can view the MARC from our OPAC: >> >> http://roger.ucsd.edu/search/.b4827884/.b4827884/1,1,1,B/detlmarc~1234567&FF=&1,0, >> >> The public display in our digital collections site: >> >> http://libraries.ucsd.edu/ark:/20775/bb0648473d >> >> The RDF for the MODS looks like: >> >> >>local >>FVLP 222-1 >> >> >>ARK >> >> http://libraries.ucsd.edu/ark:/20775/bb0648473d >> >> >>Brown, Victor W >>personal >> >> >>Amateur Film Club of San Diego >>corporate >> >> >>[196-] >> >> >>2005 >>Film and Video Library, University of California, >> San Diego, La Jolla, CA 92093-0175 >> http://orpheus.ucsd.edu/fvl/FVLPAGE.HTM >> >> >>reformatted digital >>16mm; 1 film reel (25 min.) :; sd., col. ; >> >> >>lcsh >>Ranching >> >> >> etc. >> >> >> There is definitely some loss in the conversion process -- I don't know >> enough about the MARC leader and control fields to know if they are captured >> in the MODS and/or RDF in some way. But there are quite a few local and >> note fields that aren't present in the RDF. Other fields (e.g. 300 and 505) >> are mapped to MODS, but not displayed in our access system (though they are >> indexed for searching). >> >> I agree it's hard to quantify lossy-ness. Counting fields or characters >> would be the most objective, but has obvious problems with control >> characters sometimes containing a lot of information, and then the relative >> importance of different fields to the overall description. There are other >> issues too -- some fields in this record weren't migrated because they >> duplicated collection-wide values, which are formulated slightly differently >> from the MARC record. Some fields weren't migrated because they concern the >> physical object, and therefore don't really apply to the digital object. So >> that really seems like a morass
Re: [CODE4LIB] Models of MARC in RDF
I think the strength of adopting RDF is that it doesn't tie us to a single vocab/schema. That isn't to say it isn't desirable for us to establish common approaches, but that we need to think slightly differently about how this is done - more application profiles than 'one true schema'. This is why RDA worries me - because it (seems to?) suggest that we define a schema that stands alone from everything else and that is used by the library community. I'd prefer to see the library community adopting the best of what already exists and then enhancing where the existing ontologies are lacking. If we are going to have a (web of) linked data, then re-use of ontologies and IDs is needed. For example in the work I did at the Open University in the UK we ended up only a single property from a specific library ontology (the draft ISBD http://metadataregistry.org/schemaprop/show/id/1957.html "has place of publication, production, distribution"). I think it is interesting that many of the MARC->RDF mappings so far have adopting many of the same ontologies (although no doubt partly because there is a 'follow the leader' element to this - or at least there was for me when looking at the transformation at the Open University) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 5 Dec 2011, at 18:56, Jonathan Rochkind wrote: > On 12/5/2011 1:40 PM, Karen Coyle wrote: >> >> This brings up another point that I haven't fully grokked yet: the use of >> MARC kept library data "consistent" across the many thousands of libraries >> that had MARC-based systems. > > Well, only somewhat consistent, but, yeah. > >> What happens if we move to RDF without a standard? Can we rely on linking to >> provide interoperability without that rigid consistency of data models? > > Definitely not. I think this is a real issue. There is no magic to "linking" > or RDF that provides interoperability for free; it's all about the > vocabularies/schemata -- whether in MARC or in anything else. (Note > different national/regional library communities used different schemata in > MARC, which made interoperability infeasible there. Some still do, although > gradually people have moved to Marc21 precisely for this reason, even when > Marc21 was less powerful than the MARC variant they started with). > > That is to say, if we just used MARC's own implicit vocabularies, but output > them as RDF, sure, we'd still have consistency, although we wouldn't really > _gain_ much.On the other hand, if we switch to a new better vocabulary -- > we've got to actually switch to a new better vocabulary. If it's just > "whatever anyone wants to use", we've made it VERY difficult to share data, > which is something pretty darn important to us. > > Of course, the goal of the RDA process (or one of em) was to create a new > schema for us to consistently use. That's the library community effort to > maintain a common schema that is more powerful and flexible than MARC. If > people are using other things instead, apparently that failed, or at least > has not yet succeeded.
Re: [CODE4LIB] Models of MARC in RDF
Fair point. Just instinct on my part that putting it in a triple is a bit ugly :) It probably doesn't make any difference, although I don't think storing in a triple ensures that it sticks to the object (you could store the triple anywhere as well) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 6 Dec 2011, at 22:43, Fleming, Declan wrote: > Hi - point at it where? We could point back to the library catalog that we > harvested in the MARC to MODS to RDF process, but what if that goes away? > Why not write ourselves a 1K insurance policy that sticks with the object for > its life? > > D > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen > Stephens > Sent: Tuesday, December 06, 2011 8:06 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Models of MARC in RDF > > I'd suggest that rather than shove it in a triple it might be better to point > at alternative representations, including MARC if desirable (keep meaning to > blog some thoughts about progressively enhanced metadata...) > > Owen > > Owen Stephens > Owen Stephens Consulting > Web: http://www.ostephens.com > Email: o...@ostephens.com > Telephone: 0121 288 6936 > > On 6 Dec 2011, at 15:44, Karen Coyle wrote: > >> Quoting "Fleming, Declan" : >> >>> Hi - I'll note that the mapping decisions were made by our metadata >>> services (then Cataloging) group, not by the tech folks making it all >>> work, though we were all involved in the discussions. One idea that >>> came up was to do a, perhaps, lossy translation, but also stuff one >>> triple with a text dump of the whole MARC record just in case we >>> needed to grab some other element out we might need. We didn't do >>> that, but I still like the idea. Ok, it was my idea. ;) >> >> I like that idea! Now that "disk space" is no longer an issue, it makes good >> sense to keep around the "original state" of any data that you transform, >> just in case you change your mind. I hadn't thought about incorporating the >> entire MARC record string in the transformation, but as I recall the average >> size of a MARC record is somewhere around 1K, which really isn't all that >> much by today's standards. >> >> (As an old-timer, I remember running the entire Univ. of California >> union catalog on 35 megabytes, something that would now be considered >> a smallish email attachment.) >> >> kc >> >>> >>> D >>> >>> -Original Message- >>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf >>> Of Esme Cowles >>> Sent: Monday, December 05, 2011 11:22 AM >>> To: CODE4LIB@LISTSERV.ND.EDU >>> Subject: Re: [CODE4LIB] Models of MARC in RDF >>> >>> I looked into this a little more closely, and it turns out it's a little >>> more complicated than I remembered. We built support for transforming to >>> MODS using the MODS21slim2MODS.xsl stylesheet, but don't use that. >>> Instead, we use custom Java code to do the mapping. >>> >>> I don't have a lot of public examples, but there's at least one public >>> object which you can view the MARC from our OPAC: >>> >>> http://roger.ucsd.edu/search/.b4827884/.b4827884/1,1,1,B/detlmarc~123 >>> 4567&FF=&1,0, >>> >>> The public display in our digital collections site: >>> >>> http://libraries.ucsd.edu/ark:/20775/bb0648473d >>> >>> The RDF for the MODS looks like: >>> >>> >>> local >>> FVLP 222-1 >>> >>> >>> ARK >>> >>> http://libraries.ucsd.edu/ark:/20775/bb0648473d >>> >>> >>> Brown, Victor W >>> personal >>> >>> >>> Amateur Film Club of San Diego >>> corporate >>> >>> >>> [196-] >>> >>> >>> 2005 >>> Film and Video Library, University of California, >>> San Diego, La Jolla, CA 92093-0175 >>> http://orpheus.ucsd.edu/fvl/FVLPAGE.HTM >>> >>> >>> reformatted digital >>> 16mm; 1 film reel (25 min.) :; sd., col.
Re: [CODE4LIB] Models of MARC in RDF
When I did a project converting records from UKMARC -> MARC21 we kept the UKMARC records for a period (about 5 years I think) while we assured ourselves that we hadn't missed anything vital. We did occasionally refer back to the older record to check things, but having not found any major issues with the conversion after that period we felt confident disposing of the record. This is the type of usage I was imagining for a copy of the MARC record in this scenario. Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 7 Dec 2011, at 01:52, Montoya, Gabriela wrote: > One critical thing to consider with MARC records (or any metadata, for that > matter) is that it they are not stagnant, so what is the value of storing > entire record strings into one triple if we know that metadata is volatile? > As an example, UCSD has over 200,000 art images that had their metadata > records ingested into our local DAMS over five years ago. Since then, many of > these records have been edited/massaged in our OPAC (and ARTstor), but these > updated records have not been refreshed in our DAMS. Now we find ourselves > needing to desperately have the "What is our database of record?" > conversation. > > I'd much rather see resources invested in data synching than spending it in > saving text dumps that will most likely not be referred to again. > > Dream Team for Building a MARC > RDF Model: Karen Coyle, Alistair Miles, > Diane Hillman, Ed Summers, Bradley Westbrook. > > Gabriela > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen > Coyle > Sent: Tuesday, December 06, 2011 7:44 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Models of MARC in RDF > > Quoting "Fleming, Declan" : > >> Hi - I'll note that the mapping decisions were made by our metadata >> services (then Cataloging) group, not by the tech folks making it all >> work, though we were all involved in the discussions. One idea that >> came up was to do a, perhaps, lossy translation, but also stuff one >> triple with a text dump of the whole MARC record just in case we >> needed to grab some other element out we might need. We didn't do >> that, but I still like the idea. Ok, it was my idea. ;) > > I like that idea! Now that "disk space" is no longer an issue, it makes good > sense to keep around the "original state" of any data that you transform, > just in case you change your mind. I hadn't thought about incorporating the > entire MARC record string in the transformation, but as I recall the average > size of a MARC record is somewhere around 1K, which really isn't all that > much by today's standards. > > (As an old-timer, I remember running the entire Univ. of California union > catalog on 35 megabytes, something that would now be considered a smallish > email attachment.) > > kc > >> >> D >> >> -Original Message- >> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf >> Of Esme Cowles >> Sent: Monday, December 05, 2011 11:22 AM >> To: CODE4LIB@LISTSERV.ND.EDU >> Subject: Re: [CODE4LIB] Models of MARC in RDF >> >> I looked into this a little more closely, and it turns out it's a >> little more complicated than I remembered. We built support for >> transforming to MODS using the MODS21slim2MODS.xsl stylesheet, but >> don't use that. Instead, we use custom Java code to do the mapping. >> >> I don't have a lot of public examples, but there's at least one public >> object which you can view the MARC from our OPAC: >> >> http://roger.ucsd.edu/search/.b4827884/.b4827884/1,1,1,B/detlmarc~1234 >> 567&FF=&1,0, >> >> The public display in our digital collections site: >> >> http://libraries.ucsd.edu/ark:/20775/bb0648473d >> >> The RDF for the MODS looks like: >> >> >>local >>FVLP 222-1 >> >> >>ARK >> >> http://libraries.ucsd.edu/ark:/20775/bb0648473d >> >> >>Brown, Victor W >>personal >> >> >>Amateur Film Club of San Diego >>corporate >> >> >>[196-] >> >> >>2005 >>Film and Video Library, University of >> California, San Diego, La Jolla, CA 92093-0175 >> http://
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On 7 Dec 2011, at 00:38, Alexander Johannesen wrote: > Hiya, > > Karen Coyle wrote: >> I wonder how easy it will be to >> manage a metadata scheme that has cherry-picked from existing ones, so >> something like: >> >> dc:title >> bibo:chapter >> foaf:depiction > > Yes, you're right in pointing out this as a problem. And my answer is; > it's complicated. My previous "rant" on this list was about data > models*, and dangnabbit if this isn't related as well. > > What your example is doing is pointing out a new model based on bits > of other models. This works fine, for the most part, when the concepts > are simple; simple to understand, simple to extend. Often you'll find > that what used to be unclear has grown clear over time (as more and > more have used FOAF, you'll find some things are more used and better > understood, while other parts of it fade into 'we don't really use > that anymore') > > But when things get complicated, it *can* render your model unusable. > Mixed data models can be good, but can also lead directly to meta data > hell. For example ; > > dc:title > foaf:title > > Ouch. Although not a biggie, I see this kind of discrepancy all the > time, so the argument against mixed models is of course that the power > of definition lies with you rather than some third-party that might > change their mind (albeit rare) or have similar terms that differ > (more often). > > I personally would say that the library world should define RDA as you > need it to be, and worry less about reuse at this stage unless you > know for sure that the external models do bibliographic meta data > well. > I agree this is a risk, and I suspect there is a further risk around simply the feeling of 'ownership' by the community - perhaps it is easier to feel ownership over an entire ontoloy than an 'application profile' of somekind. It maybe that mapping is the solution to this, but if this is really going to work I suspect it needs to be done from the very start - otherwise it is just another crosswalk, and we'll get varying views on how much one thing maps to another (but perhaps that's OK - I'm not looking for perfection) That said, I believe we need absolutely to be aiming for a world in which we work with mixed ontologies - no matter what we do other, relevant, data sources will use FOAF, Bibo etc.. I'm convinced that this gives us the opportunity to stop treating what are very mixed materials in a single way, while still exploiting common properties. For example Musical materials are really not well catered for in MARC, and we know there are real issues with applying FRBR to them - and I see the implementation of RDF/Linked Data as an opportunity to tackle this issue by adopting alternative ontologies where it makes sense, while still assigning common properties (dc:title) where this makes sense. > HOWEVER! > > When we're done talking about ontologies and vocabularies, we need to > talk about identifiers, and there I would swing the other way and let > reuse govern, because it is when you reuse an identifier you start > thinking about what that identifiers means to *both* parties. Or, put > differently ; > > It's remarkably easier to get this right if the identifier is a > number, rather than some word. And for that reason I'd say reuse > identifiers (subject proxies) as they are easier to get right and > bring a lot of benefits, but not ontologies (model proxies) as they > can be very difficult to get right and don't necessarily give you what > you want. Agreed :)
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
The other issue that the 'modelling' brings (IMO) is that the model influences use - or better the other way round, the intended use and/or audience should influence the model. This raises questions for me about the value of a 'neutral' model - which is what I perceive libraries as aiming for - treating users as a homogenous mass with needs that will be met by a single approach. Obviously there are resource implications to developing multiple models for different uses/audiences, and once again I'd argue that an advantage of the linked data approach is that it allows for the effort to be distributed amongst the relevant communities. To be provocative - has the time come for us to abandon the idea that 'libraries' act as one where cataloguing is concerned, and our metadata serves the same purpose in all contexts? (I can't decide if I'm serious about this or not!) Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 11 Dec 2011, at 23:47, Karen Coyle wrote: > Quoting Richard Wallis : > > >> You get the impression that the BL "chose a subset of their current >> bibliographic data to expose as LD" - it was kind of the other way around. >> Having modeled the 'things' in the British National Bibliography domain >> (plus those in related domain vocabularis such as VIAF, LCSH, Geonames, >> Bio, etc.), they then looked at the information held in their [Marc] bib >> records to identify what could be extracted to populate it. > > Richard, I've been thinking of something along these lines myself, especially > as I see the number of "translating X to RDF" projects go on. I begin to > wonder what there is in library data that is *unique*, and my conclusion is: > not much. Books, people, places, topics: they all exist independently of > libraries, and libraries cannot take the credit for creating any of them. So > we should be able to say quite a bit about the resources in libraries using > shared data points -- and by that I mean, data points that are also used by > others. So once you decide on a model (as BL did), then it is a matter of > looking *outward* for the data to re-use. > > I maintain, however, as per my LITA Forum talk [1] that the subject headings > (without talking about quality thereof) and classification designations that > libraries provide are an added value, and we should do more to make them > useful for discovery. > > >> >> I know it is only semantics (no pun intended), but we need to stop using >> the word 'record' when talking about the future description of 'things' or >> entities that are then linked together. That word has so many built in >> assumptions, especially in the library world. > > I'll let you battle that one out with Simon :-), but I am often at a loss for > a better term to describe the unit of metadata that libraries may create in > the future to describe their resources. Suggestions highly welcome. > > kc > [1] http://kcoyle.net/presentations/lita2011.html > > > > > > -- > Karen Coyle > kco...@kcoyle.net http://kcoyle.net > ph: 1-510-540-7596 > m: 1-510-435-8234 > skype: kcoylenet
Re: [CODE4LIB] Namespace management, was Models of MARC in RDF
On 11 Dec 2011, at 23:30, Richard Wallis wrote: > > There is no document I am aware of, but I can point you at the blog post by > Tim Hodson [ > http://consulting.talis.com/2011/07/british-library-data-model-overview/] > who helped the BL get to grips with and start thinking Linked Data. > Another by the BL's Neil Wilson [ > http://consulting.talis.com/2011/10/establishing-the-connection/] filling > in the background around his recent presentations about their work. Neil Wilson at the BL has indicated a few times that in principle the BL has no problem sharing the software they used to extract the relevant data from the MARC records, but that there are licensing issues around the s/w due to the use of a proprietary compiler (sorry, I don't have any more details so I can't explain any more than this). I'm not sure whether this extends to sharing the source that would tell us what exactly was happening, but I think this would be worth more discussion with Neil - I'll try to pursue it with him when I get a chance Owen
Re: [CODE4LIB] creating call number browse
It just seems like if you've got Endeca doing the heavy lifting already, then building something separate just to allow you to enter a specific point in a sorted results list sounds like hard work? Two possible approaches that occur to me (and of course not knowing Endeca they may be well off base I guess). Can Endeca retrieve all records with a call number, and drop the user into a specific point in the sorted results set? I'm guessing not, otherwise you probably wouldn't be looking for alternative approaches. Is the problem dropping the user in at the right point in the sorted results set, or in the size of the results set generated? An alternative approach possibly? If Endeca can retrieve results and display them in Call Number order, then could you not submit a search that retrieves a 'shelf' of books at a time? That is, take a Call Number as an input, calculate a range around the call number to search and pass this to Endeca? This allows you to control the set size, but still there is a question of whether Endeca can drop the user into a specific point within a sorted results set. If not, then can it return records in a format that you can then manipulate (e.g. XML)? With a small, pre-sorted, results set, it should be relatively easy to build something that drops the user into the correct point based on their search? Owen Owen Stephens Assistant Director: eStrategy and Information Resources Central Library Imperial College London South Kensington Campus London SW7 2AZ t: +44 (0)20 7594 8829 e: [EMAIL PROTECTED] > -Original Message- > From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of > Emily Lynema > Sent: 21 September 2008 16:38 > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] creating call number browse > > Well, we're using LC and SUDOC here. What I really want is something > that is both searchable and browsable, so that users can type in a call > number and then browse backward and forward as much as they want in > call > number order. > > We have Endeca here, so my patrons can browse into the LC scheme and > then sort the results in call number order, but I don't have a way to > browse forward and backward starting with a specific call number (like > you would if you were browsing the shelves physically). > > -emily > > Keith Jenkins wrote: > > Emily, > > > > Are you using LC or Dewey? > > > > A while back, I wanted to generate browsable lists of new books, > > organized by topic. I ended up using the LC call number to group the > > titles into manageable groups. Here's an example: > > http://supportingcast.mannlib.cornell.edu/newbooks/?loc=mann > > > > Titles are sorted by call number, and also grouped by the initial > > letters of the LC classification, such as "Q" or "QL". For monthly > > lists of new books, most groupings usually have less than 20 titles, > > which makes for easy browsing of titles within someone's general > > subject of interest. The Table of Contents at the top of the page > > only lists those classifications that are present in the set of > titles > > currently being viewed. (In an earlier version, Q would only be > split > > into QA, QB, etc. if there were more than 20 items with Q call > > numbers.) > > > > Things do tend to get a bit out of control in some of the > > classifications for literature... no one wants to scan through a list > > of 452 titles: > > http://supportingcast.mannlib.cornell.edu/newbooks/?class=PL > > > > So for entire collections, a lot more work would be needed to create > > finer subgroups, since each classification is uniquely complex. For > > example: > > PL1-8844 : Languages of Eastern Asia, Africa, Oceania > > PL1-481 : Ural-Altaic languages > > PL21-396 : Turkic languages > > PL400-431 : Mongolian languages > > PL450-481 : Tungus Manchu languages > > > > (An idea... maybe it would work to simply forget about pre- > determined, > > named call number ranges and look for "natural breaks" in the call > > numbers, rather than trying to model the intricate details of each > > individual classification schedule.) > > > > The site runs on a set of MARC records extracted from the catalog. > > Users can also subscribe to RSS feeds for any combination of > location, > > language, or classification group. > > > > I did some early experimentation to include cover images, but never > > seemed to get enough matches to make that worthwhile. > > > > Keith > > > > Keith Jenkins > > GIS/Geospatial Applications Librari
Re: [CODE4LIB] Zotero, unapi, and formats?
At the moment Zotero development seem to be focussing on the use of RDFa using the Bibo ontology for picking up bib details from within pages (see discussion on the Bibo Google group) Owen On Tue, Apr 6, 2010 at 4:17 PM, Chad Fennell wrote: > > It's still a LOT better than COinS for Zotero, I assume though. > > Yes, if only because you get more complete metadata with things like > RIS than COinS does via OpenURL. I do like the theoretical benefit of > a metadata format request API , but the promise of richer metadata > (primarily for Zotero) was ultimately why I chose unAPI over COinS. > And yeah, better documentation would be nice, thanks for looking into > it. > > -Chad > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
[CODE4LIB] OCLC UK Mashathon and 'Liver and Mash' Mashed Library event (Liverpool, May 13-14)
Just a quick plug for the OCLC Mashathon event taking place in Liverpool on Thursday 13th May: http://mashlib2010.wordpress.com/ The Mashathon is being followed by a Mashed Library event ("Liver and Mash") the following day at the same venue. Delegates can attend either (or both) days. As Mashed Library has become a really popular event in the UK, the bookings for that are going quite quickly (we currently have 13 spaces left) but there's still plenty of room at the Mashathon. Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
[CODE4LIB] Bibliographic data on Freebase
Thought that some on the list might be interested in various discussions happening on the Freebase email list at the moment: Firstly some stuff on dealing with ISBNs http://lists.freebase.com/pipermail/freebase-discuss/2010-April/thread.html, and secondly (and more interesting I think) work on loading a University Library catalogue (see the link for emails with the subject "UniversityX Book Load<http://lists.freebase.com/pipermail/freebase-discuss/2010-April/001200.html>", and also this page http://wiki.freebase.com/wiki/UniversityX_Load), which includes work on mapping place of publication to existing Freebase location information ( http://lists.freebase.com/pipermail/freebase-discuss/2010-April/001218.html) Owen -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] Bibliographic data on Freebase
I can't wait to get my hands on Gridworks - very excited (sad, I know), but think potential for checking/correcting metadata and adding links to relevant Freebase topics is very exciting If anyone hasn't seen the demos of this yet, suggest having a look at the screencasts at http://blog.freebase.com/2010/03/26/preview-freebase-gridworks/ Owen On Thu, Apr 22, 2010 at 2:45 PM, Sean Hannan wrote: > Also, David Huynh (of Gridworks and Freebase Parallax fame) dropped into > IRC last week asking about MARC4J and its possible use with Gridworks. > > Things are afoot. > > -Sean > > > On Apr 22, 2010, at 7:27 AM, Owen Stephens wrote: > > > Thought that some on the list might be interested in various discussions > > happening on the Freebase email list at the moment: > > > > Firstly some stuff on dealing with ISBNs > > > http://lists.freebase.com/pipermail/freebase-discuss/2010-April/thread.html > , > > and secondly (and more interesting I think) work on loading a University > > Library catalogue (see the link for emails with the subject "UniversityX > > Book Load< > http://lists.freebase.com/pipermail/freebase-discuss/2010-April/001200.html > >", > > and also this page http://wiki.freebase.com/wiki/UniversityX_Load), > which > > includes work on mapping place of publication to existing Freebase > location > > information ( > > > http://lists.freebase.com/pipermail/freebase-discuss/2010-April/001218.html > ) > > > > Owen > > > > -- > > Owen Stephens > > Owen Stephens Consulting > > Web: http://www.ostephens.com > > Email: o...@ostephens.com > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] Twitter annotations and library software
We've had problems with RIS on a recent project. Although there is a specification (http://www.refman.com/support/risformat_intro.asp), it is (I feel) lacking enough rigour to ever be implemented consistently. The most common issue in the wild that I've seen is use of different tags for the same information (which the specification does not nail down enough to know when each should be used): Use of TI or T1 for primary title Use of AU or A1 for primary author Use of UR, L1 or L2 to link to 'full text' Perhaps more significantly the specification doesn't include any field specifically for a DOI, but despite this EndNote (owned by ISI ResearchSoft, who are also responsible for the RIS format specification) includes the DOI in a DO field in its RIS output - not to specification. Owen On Wed, Apr 28, 2010 at 9:17 AM, Jakob Voss wrote: > Hi > > it's funny how quickly you vote against BibTeX, but at least it is a format > that is frequently used in the wild to create citations. If you call BibTeX > undocumented and garbage then how do you call MARC which is far more > difficult to make use of? > > My assumption was that there is a specific use case for bibliographic data > in twitter annotations: > > I. Identifiy publication => this can *only* be done seriously with > identifiers like ISBN, DOI, OCLCNum, LCCN etc. > > II. Deliver a citation => use a citation-oriented format (BibTeX, CSL, RIS) > > I was not voting explicitly for BibTeX but at least there is a large > community that can make use of it. I strongly favour CSL ( > http://citationstyles.org/) because: > > - there is a JavaScript CSL-Processor. JavaScript is kind of a punishment > but it is the natural environment for the Web 2.0 Mashup crowd that is going > to implement applications that use Twitter annotations > > - there are dozens of CSL citation styles so you can display a citation in > any way you want > > As Ross pointed out RIS would be an option too, but I miss the easy open > source tools that use RIS to create citations from RIS data. > > Any other relevant format that I know (Bibont, MODS, MARC etc.) does not > aim at identification or citation at the first place but tries to model the > full variety of bibliographic metadata. If your use case is > > III. Provide semantic properties and connections of a publication > > Then you should look at the Bibliographic Ontology. But III does *not* > "just subsume" usecase II. - it is a different story that is not beeing told > by normal people but only but metadata experts, semantic web gurus, library > system developers etc. (I would count me to this groups). If you want such > complex data then you should use other systems but Twitter for data exchange > anyway. > > A list of CSL metadata fields can be found at > > http://citationstyles.org/downloads/specification.html#appendices > > and the JavaScript-Processor (which is also used in Zotero) provides more > information for developers: http://groups.google.com/group/citeproc-js > > Cheers > Jakob > > P.S: An example of a CSL record from the JavaScript client: > > { > "title": "True Crime Radio and Listener Disenchantment with Network > Broadcasting, 1935-1946", > "author": [ { >"family": "Razlogova", >"given": "Elena" > } ], > "container-title": "American Quarterly", > "volume": "58", > "page": "137-158", > "issued": { "date-parts": [ [2006, 3] ] }, > "type": "article-journal" > > } > > > -- > Jakob Voß , skype: nichtich > Verbundzentrale des GBV (VZG) / Common Library Network > Platz der Goettinger Sieben 1, 37073 Göttingen, Germany > +49 (0)551 39-10242, http://www.gbv.de > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] Twitter annotations and library software
Unfortunately RefWorks only imports DO - not exports! We now recommend using RefWorks XML when exporting (for our project) - which is fine, but not publicly documented as far as I know :( Zotero recommend using BibTex for importing from RefWorks I think Owen On Wed, Apr 28, 2010 at 2:05 PM, Walker, David wrote: > I was also just working on DOI with RIS. > > It looks like both Endnote and Refworks recognize 'DO' for DOIs. But > apparently Zotero does not. If Zotero supported it, I'd say we'd have a de > facto standard on our hands. > > In fact, I couldn't figure out how to pass a DOI to Zotero using RIS. Or, > at least, in my testing I never saw the DOI show-up in Zotero. I don't > really use Zotero, so I may have missed it. > > --Dave > > == > David Walker > Library Web Services Manager > California State University > http://xerxes.calstate.edu > ____ > From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Owen > Stephens [o...@ostephens.com] > Sent: Wednesday, April 28, 2010 2:26 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] Twitter annotations and library software > > We've had problems with RIS on a recent project. Although there is a > specification (http://www.refman.com/support/risformat_intro.asp), it is > (I > feel) lacking enough rigour to ever be implemented consistently. The most > common issue in the wild that I've seen is use of different tags for the > same information (which the specification does not nail down enough to know > when each should be used): > > Use of TI or T1 for primary title > Use of AU or A1 for primary author > Use of UR, L1 or L2 to link to 'full text' > > Perhaps more significantly the specification doesn't include any field > specifically for a DOI, but despite this EndNote (owned by ISI > ResearchSoft, > who are also responsible for the RIS format specification) includes the DOI > in a DO field in its RIS output - not to specification. > > Owen > > On Wed, Apr 28, 2010 at 9:17 AM, Jakob Voss wrote: > > > Hi > > > > it's funny how quickly you vote against BibTeX, but at least it is a > format > > that is frequently used in the wild to create citations. If you call > BibTeX > > undocumented and garbage then how do you call MARC which is far more > > difficult to make use of? > > > > My assumption was that there is a specific use case for bibliographic > data > > in twitter annotations: > > > > I. Identifiy publication => this can *only* be done seriously with > > identifiers like ISBN, DOI, OCLCNum, LCCN etc. > > > > II. Deliver a citation => use a citation-oriented format (BibTeX, CSL, > RIS) > > > > I was not voting explicitly for BibTeX but at least there is a large > > community that can make use of it. I strongly favour CSL ( > > http://citationstyles.org/) because: > > > > - there is a JavaScript CSL-Processor. JavaScript is kind of a punishment > > but it is the natural environment for the Web 2.0 Mashup crowd that is > going > > to implement applications that use Twitter annotations > > > > - there are dozens of CSL citation styles so you can display a citation > in > > any way you want > > > > As Ross pointed out RIS would be an option too, but I miss the easy open > > source tools that use RIS to create citations from RIS data. > > > > Any other relevant format that I know (Bibont, MODS, MARC etc.) does not > > aim at identification or citation at the first place but tries to model > the > > full variety of bibliographic metadata. If your use case is > > > > III. Provide semantic properties and connections of a publication > > > > Then you should look at the Bibliographic Ontology. But III does *not* > > "just subsume" usecase II. - it is a different story that is not beeing > told > > by normal people but only but metadata experts, semantic web gurus, > library > > system developers etc. (I would count me to this groups). If you want > such > > complex data then you should use other systems but Twitter for data > exchange > > anyway. > > > > A list of CSL metadata fields can be found at > > > > http://citationstyles.org/downloads/specification.html#appendices > > > > and the JavaScript-Processor (which is also used in Zotero) provides more > > information for developers: http://groups.google.com/group/citeproc-js > > > > Cheers > > Jakob > > > > P.S: An example of a CSL record from the JavaScript client: >
Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)
Dead ends from OpenURL enabled hyperlinks aren't a result of the standard though, but rather an aspect of both the problem they are trying to solve, and the conceptual way they try to do this. I'd content these dead ends are an implementation issue - and despite this I have to say that my experience on the ground is that feedback from library users on the use of link resolvers is positive - much more so than many of the other library systems I've been involved with. What I do see as a problem is that this market seems to have essentially stagnated, at least as far as I can see. I suspect the reasons for this are complex, but it would be nice to see some more innovation in this area. Owen On Thu, Apr 29, 2010 at 6:14 PM, Ed Summers wrote: > On Thu, Apr 29, 2010 at 12:08 PM, Eric Hellman wrote: > > Since this thread has turned into a discussion on OpenURL... > > > > I have to say that during the OpenURL 1.0 standardization process, we > definitely had moments of despair. Today, I'm willing to derive satisfaction > from "it works" and overlook shortcomings. It might have been otherwise. > > Personally, I've followed enough OpenURL enabled hyperlink dead ends > to contest "it works". > > //Ed > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] Twitter annotations and library software
Alex, Could you expand on how you think the problem that OpenURL tackles would have been better approached with existing mechanisms? I'm not debating this necessarily, but from my perspective when OpenURL was first introduced it solved a real problem that I hadn't seen solved before. Owen On Thu, Apr 29, 2010 at 11:55 PM, Alexander Johannesen < alexander.johanne...@gmail.com> wrote: > Hi, > > On Thu, Apr 29, 2010 at 22:47, Walker, David wrote: > > I would suggest it's more because, once you step outside of the > > primary use case for OpenURL, you end-up bumping into *other* standards. > > These issues were raised all the back when it was created, as well. I > guess it's easy to be clever in hindsight. :) Here's what I wrote > about it 5 years ago (http://shelter.nu/blog-159.html) ; > > So let's talk about 'Not invented here' first, because surely, we're > all guilty of this one from time to time. For example, lately I dug > into the ANSI/NISO Z39.88 -2004 standard, better known as OpenURL. I > was looking at it critically, I have to admit, comparing it to what I > already knew about Web Services, SOA, http, > Google/Amazon/Flickr/Del.icio.us API's, and various Topic Maps and > semantic web technologies (I was the technical editor of Explorers > Guide to the Semantic Web) > > I think I can sum up my experiences with OpenURL as such; why? Why > have the library world invented a new way of doing things that already > can be done quite well already? Now, there is absolutely nothing wrong > with the standard per se (except a pretty darn awful choice of > name!!), so I'm not here criticising the technical merits and the work > put into it. No, it's a simple 'why' that I have yet to get a decent > answer to, even after talking to the OpenURL bigwigs about it. I mean, > come on; convince me! I'm not unreasonable, no truly, really, I just > want to be convinced that we need this over anything else. > > > Regards, > > Alex > -- > Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps > --- http://shelter.nu/blog/ -- > -- http://www.google.com/profiles/alexander.johannesen --- > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] Twitter annotations and library software
Tim, I'd vote for adopting the same approach as COinS on the basis it already has some level of adoption, and we know covers at least some of the stuff libraries and academic users (as used by both libraries and consumer tools such as Zotero) might want to do. We are talking Books (from what you've said), so we don't have to worry about other formats. (although it does mean we can do journal articles and some other stuff as well for no effort) Mendeley and Zotero already speak COinS, it is pretty simple, and there are already several code libraries to deal with it. It isn't where I hope we end up in the longterm but if we talk about this happening tomorrow, why not use something that is relatively simple, already has a good set of implementations, and we know works for several cases of embedding book metadata in a web environment Owen On Thu, Apr 29, 2010 at 7:01 PM, Jakob Voss wrote: > Dear Tim, > > > you wrote: > >> So this is my recommended framework for proceeding. Tim, I'm afraid >>> you'll actually have to do the hard work yourself. >>> >> >> No, I don't. Because the work isn't fundamentally that hard. A >> complex standard might be, but I never for a moment considered >> anything like that. We have *512 bytes*, and it needs to be usable by >> anyone. Library technology is usually fatally over-engineered, but >> this is a case where that approach isn't even possible. >> > > Jonathan did a very well summary - you just have to pick what you main > focus of embedding bibliographic data is. > > > A) I favour using the CSL-Record format which I summarized at > > http://wiki.code4lib.org/index.php/Citation_Style_Language > > because I had in mind that people want to have a nice looking citation of > the publication that someone tweeted about. The drawback is that CSL is less > adopted and will not always fit in 512 bytes > > > B) If you main focus is to link Tweets about the same publication (and > other stuff about this publication) than you must embed identifiers. > LibraryThing is mainly based on two identifiers > > 1) ISBN to identify editions > 2) LT work ids to identify works > > I wonder why LT work ids have not picked up more although you thankfully > provide a full mapping to ISBN at > http://www.librarything.com/feeds/thingISBN.xml.gz but nevermind. I > thought that some LT records also contain other identifiers such as OCLC > number, LOC number etc. but maybe I am wrong. The best way to specify > identifiers is to use an URI (all relevant identifiers that I know have an > URI form). For ISBN it is > > uri:isbn:{ISBN13} > > For LT Work-ID you can use the URL with your .com top level domain: > > http://www.librarything.com/work/{LTWORKID}<http://www.librarything.com/work/%7BLTWORKID%7D> > > That would fit for tweets about books with an ISBN and for tweets about a > work which will make 99.9% of tweets from LT about single publications > anyway. > > > C) If your focus is to let people search for a publication in libraries > than and to copy bibliographic data in reference management software then > COinS is a way to go. COinS is based on OpenURL which I and others ranted > about because it is a crapy library standard like MARC. But unlike other > metadata formats COinS usually fits in less then 512 bytes. Furthermore you > may have to deal with it for LibraryThing for libraries anyway. > > > Although I strongly favour CSL as a practising library scientist and > developer I must admit that for LibraryThing the best way is to embed > identifiers (ISBN and LT Work-ID) and maybe COinS. As long as LibraryThing > does not open up to more complex publications like preprints of > proceeding-articles in series etc. but mainly deals with books and works > this will make LibraryThing users happy. > > > Then, three years from now, we can all conference-tweet about a CIL talk, >> about all the cool ways libraries are using Twitter, and how it's such a >> shame that the annotations standard wasn't designed with libraries in mind. >> > > How about a bet instead of voting. In three years will there be: > > a) No relevant Twitter annotations anyway > b) Twitter annotations but not used much for bibliographic data > c) A rich variety of incompatible bibliographic annotation standards > d) Semantic Web will have solved every problem anyway > .. > > Cheers > Jakob > > -- > Jakob Voß , skype: nichtich > Verbundzentrale des GBV (VZG) / Common Library Network > Platz der Goettinger Sieben 1, 37073 Göttingen, Germany > +49 (0)551 39-10242, http://www.gbv.de > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] Twitter annotations and library software
Thanks Alex, This makes sense, and yes I see what your saying - and yes, if you end up going back to custom coding because it's easier it does seem to defeat the purpose. However I'd argue that actually OpenURL 'succeeded' because it did manage to get some level of acceptance (ignoring the question of whether it is v0.1 or v1.0) - the cost of developing 'link resolvers' would have been much higher if we'd been doing something different for each publisher/platform. In this sense (I'd argue) sometimes crappy standards are better than none. We've used OpenURL v1.0 in a recent project and because we were able to simply pick up code already done for Zotero, and we already had an OpenURL resolver, the amount of new code we needed for this was minimal. I think the point about Link Resolvers doing stuff that Apache and CGI scripts were already doing is a good one - and I've argued before that what we actually should do is separate some of this out (a bit like Johnathan did with Umlaut) into an application that can answer questions about location (what is generally called the KnowledgeBase in link resolvers) and the applications that deal with analysing the context and the redirection (To introduce another tangent in a tangential thread, interestingly (I think!) I'm having a not dissimilar debate about Linked Data at the moment - there are many who argue that it is too complex and that as long as you have a nice RESTful interface you don't need to get bogged down in ontologies and RDF etc. I'm still struggling with this one - my instinct is that it will pay to standardise but so far I've not managed to convince even myself this is more than wishful thinking at the moment) Owen On Fri, Apr 30, 2010 at 10:33 AM, Alexander Johannesen < alexander.johanne...@gmail.com> wrote: > On Fri, Apr 30, 2010 at 18:47, Owen Stephens wrote: > > Could you expand on how you think the problem that OpenURL tackles would > > have been better approached with existing mechanisms? > > As we all know, it's pretty much a spec for a way to template incoming > and outgoing URLs, defining some functionality along the way. As such, > URLs with basic URI templates and rewriting have been around for a > long time. Even longer than that is just the basics of HTTP which have > status codes and functionality to do exactly the same. We've been > doing link resolving since mid 90's, either as CGI scripts, or as > Apache modules, so none of this were new. URI comes in, you look it up > in a database, you cross-check with other REQUEST parameters (or > sessions, if you must, as well as IP addresses) and pop out a 303 > (with some possible rewriting of the outgoing URL) (with the hack we > needed at the time to also create dummy pages with META tags > *shudder*). > > So the idea was to standardize on a way to do this, and it was a good > idea as such. OpenURL *could* have had a great potential if it > actually defined something tangible, something concrete like a model > of interaction or basic rules for fishing and catching tokens and the > like, and as someone else mentioned, the 0.1 version was quite a good > start. But by the time when 1.0 came out, all the goodness had turned > so generic and flexible in such a complex way that handling it turned > you right off it. The standard also had a very difficult language, and > more specifically didn't use enough of the normal geeky language used > by sysadmins around. The more I tried to wrap my head around it, the > more I felt like just going back to CGI scripts that looked stuff up > in a database. It was easier to hack legacy code, which, well, defeats > the purpose, no? > > Also, forgive me if I've forgotten important details; I've suppressed > this part of my life. :) > > > Kind regards, > > Alex > -- > Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps > --- http://shelter.nu/blog/ -- > -- http://www.google.com/profiles/alexander.johannesen --- > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)
Although part of the problem is that you might want to offer any service on the basis of an OpenURL the major use case is supply of a document (either online or via ILL) - so it strikes me you could look at DAIA http://www.gbv.de/wikis/cls/DAIA_-_Document_Availability_Information_API ? Jakob does this make sense? Owen On Fri, Apr 30, 2010 at 3:08 PM, Eric Hellman wrote: > OK, what does the EdSuRoSi spec for OpenURL responses say? > > Eric > > On Apr 30, 2010, at 9:40 AM, Ed Summers wrote: > > > On Fri, Apr 30, 2010 at 9:09 AM, Ross Singer > wrote: > >> I actually think this lack of any specified response format is a large > >> factor in the stagnation of OpenURL as a technology. Since a resolver > >> is under no obligation to do anything but present a web page it's > >> difficult for local entrepreneurial types to build upon the > >> infrastructure simply because there are no guarantees that it will > >> work anywhere else (or even locally, depending on your vendor, I > >> suppose), much less contribute back to the ecosystem. > > > > I agree. And that's an issue with the standard, not the implementations. > > > > //Ed > > Eric Hellman > President, Gluejar, Inc. > 41 Watchung Plaza, #132 > Montclair, NJ 07042 > USA > > e...@hellman.net > http://go-to-hellman.blogspot.com/ > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
[CODE4LIB] Aquabrowser, SRU/SRW and other output formats
In this document http://www.docstoc.com/docs/2286293/AquaBrowser-Library-FAQ-ADA-Compliant (and I've know idea what the source of this is) it says: "For data output, Aquabrowser supports RSS using either Dublin Core (DC) or MarcXML metadata schemas. Also search results can be retrieved in SRU/SRW with either Dublin Core (DC) or MarcXML schemas. Communication to the client browser supports XML and JSON formats" Does anyone know if this is true? If so, is this part of the basic product or an add-on? My local public library has recently started to use Aquabrowser, and I'm interested in whether I can get access to search results etc. in a nice format for reuse etc. Thanks Owen -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] locator
Hi Tom, The mapping the library project started out (in my head) as simply using existing mapping tools to provide an interface to a map. The way the project went when we sat down and played for a day was slightly different, although still vaguely interesting :) The thinking behind using Google Maps (which would apply to other 'mapping' interfaces - e.g. OpenLayers) was simply you get a set of tools that are designed to help navigation round a physical space. You can dispense with the geographic representation and simply use your own floorplan images. Whether this is the way to go probably depends on your requirements - but you would get functions like the ability to drop markers etc. 'for free' as it were, and also a well documented approach as the GMaps etc APIs come with good documentation. However, more than once it has been suggested that this is a more complex approach than is required (I'm still not convinced by this - I think there are real strengths to this 'off the shelf' approach) Some other bits and pieces that may be of interest: My writeup of the day we worked on the Mapping the Library project http://www.meanboyfriend.com/overdue_ideas/2009/12/mashing-and-mapping/ A JISC funded project to look at producing 'item locator' service at the LSE http://findmylibrarybook.blogspot.com/ Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 30 Jun 2010, at 13:24, Tom Vanmechelen wrote: > We're considering to expand our service with a item locator. "Mapping the > library" (http://mashedlibrary.com/wiki/index.php?title=Mapping_the_library) > describes how to build this with Google maps. But is this really the way to > go? Does anyone has any experience with this? Does anyone have some best > practices for this kind of project knowing that we have about 20 buildings > spread all over the town? > > Tom > > --- > Tom Vanmechelen > > K.U.Leuven / LIBIS > W. De Croylaan 54 bus 5592 > BE-3001 Heverlee > Tel +32 16 32 27 93
Re: [CODE4LIB] DIY aggregate index
As others have suggested I think much of this is around the practicalities of negotiating access, and the server power & expertise needed to run the service - simply more efficient to do this in one place. For me the change that we need to open this up is for publishers to start pushing out a lot more of this data to all comers, rather than having to have this conversation several times over with individual sites or suppliers. How practical this is I'm not sure - especially as we are talking about indexing full-text where available (I guess). I think the Google News model (5-clicks free) is an interesting one - but not sure whether this, or a similar approach, would work in a niche market which may not be so interested in total traffic. It seems (to me) obviously in the publishers interest for their content to be as easily discoverable as possible that I am optimistic they will gradually become more open to sharing more data that aids this - at least metadata. I'd hope that this would eventually open up the market to a broader set of suppliers, as well as institutions doing their own thing. Owen On Thu, Jul 1, 2010 at 2:37 AM, Eric Lease Morgan wrote: > On Jun 30, 2010, at 8:43 PM, Blake, Miriam E wrote: > > > We have locally loaded records from the ISI databases, INSPEC, > > BIOSIS, and the Department of Energy (as well as from full-text > > publishers, but that is another story and system entirely.) Aside > > from the contracts, I can also attest to the major amount of > > work it has been. We have 95M bibliographic records, stored in > > > 75TB of disk, and counting. Its all running on SOLR, with a local > > interface and the distributed aDORe repository on backend. ~ 2 > > FTE keep it running in production now. > > > I definitely think what is outlined above -- local indexing -- is the way > to go in the long run. Get the data. Index it. Integrate it into your other > system. Know that you have it when you change or drop the license. No > renting of data. And, "We don't need no stinkin' interfaces!" I believe a > number of European institutions have been doing this for a number of years. > I hear a few of us in the United States following suit. ++ > > -- > Eric Morgan > University of Notre Dame. > -- Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com
Re: [CODE4LIB] "universal citation index"
Since noone has mentioned it yet, and it seems like it might be relevant, it may be worth looking at the CITO (Citation Ontology) (see http://imageweb.zoo.ox.ac.uk/pub/2008/publications/Shotton_ISMB_BioOntology_CiTO_final_postprint.pdf and http://imageweb.zoo.ox.ac.uk/pub/2009/citobase/cito-20091124-1.4/cito-content/owldoc/) It is important to note that CITO describes the nature of a citation, as opposed to describing the thing cited. It also suggests a different angle on what a citation is - that is a citation is only a citation in context, otherwise it is simply a description of something. Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 20 Jul 2010, at 22:53, Young,Jeff (OR) wrote: > I suspect this discussion happened on code4lib before the thread got > cross-posting to LLD XG where I first saw it. > > There are undoubtedly a ton of diverse use cases, but that doesn't mean > APIs are the best solution. Here are some spitball possibilities for > "not just manifestations" and "we need page numbers". > > http://example.org/frbr:serial/2/citation-apa.{bcp-47}.txt > http://example.org/frbr:manifestation/1/citation-apa.{bcp-47}.txt?xyz:st > artPage=5&xyz:endPage=6 > > I'm imagining an xyz ontology with startPage and endPage, but we can > surely create it if something doesn't already exist. > > Jeff > >> -Original Message- >> From: Tom Morris [mailto:tfmor...@gmail.com] >> Sent: Tuesday, July 20, 2010 5:37 PM >> To: Young,Jeff (OR) >> Cc: Karen Coyle; Jodi Schneider; public-lld; Code for Libraries; Brian >> Mingus >> Subject: Re: "universal citation index" >> >> On Tue, Jul 20, 2010 at 1:40 PM, Young,Jeff (OR) >> wrote: >>> In terms of Linked Data, it should make sense to treat citations as >>> text/plain variant representations of a FRBR Manifestation. >> >> As Karen mentioned, many types of citation need more information than >> just the manifestation. You also need pages numbers, etc. >> >> Tom > > >
Re: [CODE4LIB] URL checking for the catalog
It's not quite the same thing, but I worked on a project a couple of years ago integrating references/citations into a learning environment (called Telstar http://www8.open.ac.uk/telstar/) , and looked at the question of how to deal with broken links from references. We proposed a more reactive mechanism than running link checking software. This clearly has some disadvantages, but I think a major advantage is the targetting of staff time towards those links that are being used. The mechanism proposed was to add a level of redirection, with an intermediary script checking the availability of the destination URL before either: a) passing the user on to the destination b) finding the destination URL unresponsive (e.g. 404), automatically reporting the issue to library staff, and directing the user to a page explaining that the resource was not currently responding and that library staff had been informed Particularly we proposed putting the destination URL into the rft_id of an OpenURL to achieve this, but this was only because it allowed us to piggyback on existing infrastructure using a standard approach - you could do the same with a simple script, with the destination URL as a parameter (if you are really interested, we created a new Source parser in SFX to do (a) and (b) ). Because we didn't necessarily have control over the URL in the reference, we also built a table that allowed us to map broken URLs being used in the learning environment to alternative URLs so we could offer a temporary redirect while we worked with the relevant staff to get corrections made to the reference link. There's some more on this at http://www.open.ac.uk/blogs/telstar/remit-toc/remit-the-open-university-approach/remit-providing-links-to-resources-from-references/6-8-3-telstar-approach/ although for some reason (my fault) this doesn't include a write up of the link checking process/code we created. Of course, this approach is in no way incompatible with regular proactive link checking. Owen Owen Stephens Owen Stephens Consulting Web: http://www.ostephens.com Email: o...@ostephens.com Telephone: 0121 288 6936 On 23 Feb 2012, at 17:02, Tod Olson wrote: > There's been some recent discussion at our site about revi(s|v)ing URL > checking in our catalog, and I was wondering if other sites have any > strategies that they have found to be effective. > > We used to run some home-grown link checking software. It fit nicely into a > shell pipeline, so it was easy to filter out sites that didn't want to be > link checked. But still the reports had too many spurious errors. And with > over a million links in the catalog, there are some issues of scale, both for > checking the links and consuming any report. > > Anyhow, if you have some system you use as part of catalog link maintenance, > or if there's some link checking software that you've had good experiences > with, or if there's some related experience you'd like to share, I'd like to > hear about it. > > Thanks, > > -Tod > > > Tod Olson > Systems Librarian > University of Chicago Library