from:"Owen Stephens"

Re: [CODE4LIB] Retrieving ISSN using a DOI

2014-03-05 Thread Owen Stephens

You should be able to use the content negotiation support on Crossref to get 
the metadata, which does include the ISSNs - or at least has the potential to 
if they are available. E.g. 

curl -LH "Accept: application/rdf+xml;q=0.5, 
application/vnd.citationstyles.csl+json;q=1.0" 
http://dx.doi.org/10.1126/science.169.3946.635

Gives 

{
  "subtitle": [],
  "subject": [
"General"
  ],
  "issued": {
"date-parts": [
  [
1970,
8,
14
  ]
]
  },
  "score": 1.0,
  "prefix": "http://id.crossref.org/prefix/10.1126";,
  "author": [
{
  "family": "Frank",
  "given": "H. S."
}
  ],
  "container-title": "Science",
  "page": "635-641",
  "deposited": {
"date-parts": [
  [
2011,
6,
27
  ]
],
"timestamp": 130913280
  },
  "issue": "3946",
  "title": "The Structure of Ordinary Water: New data and interpretations are 
yielding new insights into this fascinating substance",
  "type": "journal-article",
  "DOI": "10.1126/science.169.3946.635",
  "ISSN": [
"0036-8075",
"1095-9203"
  ],
  "URL": "http://dx.doi.org/10.1126/science.169.3946.635";,
  "source": "CrossRef",
  "publisher": "American Association for the Advancement of Science (AAAS)",
  "indexed": {
"date-parts": [
  [
2013,
11,
7
  ]
],
"timestamp": 1383796678887
  },
  "volume": "169"
}


Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 5 Mar 2014, at 12:30, Graham, Stephen  wrote:

> OK, I've received a couple of emails telling me that the ISSN is not always 
> included in the DOI - that it depends on the publisher. So, I guess my 
> original question still stands!
> 
> Stephen
> 
> From: Graham, Stephen
> Sent: 05 March 2014 12:25
> To: 'CODE4LIB@LISTSERV.ND.EDU'
> Subject: RE: Retrieving ISSN using a DOI
> 
> Sorry - I've answered my own question. The ISSN is actually contained in the 
> DOI. Didn't realise this! D'oh!
> 
> Stephen
> 
> From: Graham, Stephen
> Sent: 05 March 2014 12:14
> To: 'CODE4LIB@LISTSERV.ND.EDU'
> Subject: Retrieving ISSN using a DOI
> 
> Hi All - is there a service/API that will return the ISSN if I provide the 
> DOI? I was hoping that the Crossref API would do this, but I can't see the 
> ISSN in the JSON it returns.
> 
> I'm adding a DOI field to our OPAC ILL form, so if the user has the DOI they 
> can use this to populate the form rather than add all the data manually. When 
> the user submits the form I'm querying our openURL resolver API to see if we 
> have access to the article. If we do then the form will alert the user and 
> provide a link. The query to the openURL resolver works better if we have the 
> ISSN, but if the user has used a DOI the ISSN is frustratingly never there.
> 
> Stephen
> 
> Stephen Graham
> Online Information Manager
> Information Collections and Services
> University of Hertfordshire, Hatfield.  AL10 9AB
> Tel. 01707 286111
> Email s.grah...@herts.ac.uk<mailto:s.grah...@herts.ac.uk>

Re: [CODE4LIB] tool for finding close matches in vocabular list

2014-03-21 Thread Owen Stephens

As Roy suggests, Open Refine is designed for this type of work and could easily 
deal with the volume you are talking about here. It can cluster terms using a 
variety of algorithms and easily apply a set of standard transformations.

The screencasts and info at http://freeyourmetadata.org/cleanup/ might be a 
good starting point if you want to see what Refine can do

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 21 Mar 2014, at 18:24, Ken Irwin  wrote:

> Hi folks,
> 
> I'm looking for a tool that can look at a list of all of subject terms in a 
> poorly-controlled index as possible candidates for term consolidation. Our 
> student newspaper index has about 16,000 subject terms and they include a lot 
> of meaningless typographical and nomenclatural difference, e.g.:
> 
> Irwin, Ken
> Irwin, Kenneth
> Irwin, Mr. Kenneth
> Irwin, Kenneth R.
> 
> Basketball - Women
> Basketball - Women's
> Basketball-Women
> Basketball-Women's
> 
> I would love to have some sort of pattern-matching tool that's smart about 
> this sort of thing that could go through the list of terms (as a text list, 
> database, xml file, or whatever structure it wants to ingest) and spit out 
> some clusters of possible matches.
> 
> Does anyone know of a tool that's good for that sort of thing?
> 
> The index is just a bunch of MySQL tables - there is no real controlled-vocab 
> system, though I've recently built some systems to suggest known SH's to 
> reduce this sort of redundancy.
> 
> Any ideas?
> 
> Thanks!
> Ken

Re: [CODE4LIB] semantic web browsers

2014-03-22 Thread Owen Stephens

Your findings reflect my experience - there isn't much out there and what is 
basic or doesn't work at all.
Link Sailor is another http://linksailor.com but I suspect not actively 
maintained (developed by Ian Davis when he was at Talis doing linked data work)

I think the Graphite based browser from Southampton *does* support 
content-negotiation - what makes you think it doesn't?

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 22 Mar 2014, at 20:49, Eric Lease Morgan  wrote:

> Do you know of any working Semantic Web browsers?
> 
> Below is a small set of easy-to-use Semantic Web browsers. Give them URIs and 
> they allow you to follow and describe the links they include.
> 
>  * LOD Browser Switch (http://browse.semanticweb.org) - This is
>really a gateway to other Semantic Web browsers. Feed it a URI
>and it will create lists of URLs pointing to Semantic Web
>interfaces, but many of the URLs (Semantic Web interfaces) do not
>seem to work. Some of the resulting URLs point to RDF
>serialization converters
> 
>  * LodLive (http://en.lodlive.it) - This Semantic Web browser
>allows you to feed it a URI and interactively follow the links
>associated with it. URIs can come from DBedia, Freebase, or one
>of your own.
> 
>  * Open Link Data Explorer
>(http://demo.openlinksw.com/rdfbrowser2/) - The most
>sophisticated Semantic Web browser in this set. Given a URI it
>creates various views of the resulting triples associated with
>including lists of all its properties and objects, networks
>graphs, tabular views, and maps (if the data includes geographic
>points).
> 
>  * Quick and Dirty RDF browser
>(http://graphite.ecs.soton.ac.uk/browser/) - Given the URL
>pointing to a file of RDF statements, this tool returns all the
>triples in the file and verbosely lists each of their predicate
>and object values. Quick and easy.  This is a good for reading
>everything about a particular resource. The tool does not seem
>to support content negotiation.
> 
> If you need some URIs to begin with, then try some of these:
> 
>  * Ray Family Papers - http://infomotions.com/sandbox/liam/data/mum432.rdf
>  * Catholics and Jews - 
> http://infomotions.com/sandbox/liam/data/shumarc681792.rdf
>  * Walt Disney via VIAF - http://viaf.org/viaf/36927108/
>  * origami via the Library of Congress - 
> http://id.loc.gov/authorities/subjects/sh85095643
>  * Paris from DBpedia - http://dbpedia.org/resource/Paris
> 
> To me, this seems like a really small set of browser possibilities. I’ve seen 
> others but could not get them to work very well. Do you know of others? Am I 
> missing something significant?
> 
> —
> Eric Lease Morgan

[CODE4LIB] Research Libraries UK Hack day

2014-04-04 Thread Owen Stephens

Just over a year and a half ago I posted about some work I was doing on behalf 
of Research Libraries UK (RLUK) who were looking at the potential of publishing 
several million of their bibliographic records (drawn from the major research 
libraries in the UK) as linked open data.In August last year RLUK announced it 
would join The European Library (TEL)[1], and would work with the team at TEL 
to publish RLUK data, along with other data held by The European Library, as 
linked open data. I'm happy to say that they are now very close to making the 
(approximately) 17 million RLUK records available. 

To start the process of working with the wider community of librarians, 
developers, and anyone interested in exploiting this data, RLUK is holding a 
hack day in London on 14th May. Here the RLUK Linked Open Data will be 
introduced, along with the TEL API (OpenSearch based). There will be prizes (to 
be announced) for hacks in the following areas which represent areas of 
interest to RLUK and TEL:

• Linking Up datasets - a prize for work that combines data from 
multiple data sets
• WWI 
• Eastern Europe
• Delivering a valuable hack for RLUK members

The event is free and you can sign up now at 
https://www.eventbrite.co.uk/e/rluk-hack-day-rlukhack-tickets-11197529111 - I 
hope to see some of you there

Best wishes

Owen

1. http://www.rluk.ac.uk/news/rluk-joins-european-library/

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

Re: [CODE4LIB] distributed responsibility for web content

2014-04-18 Thread Owen Stephens

I'd second the suggestions from Erin with regards establishing style guides and 
Ross's suggestion of peer review. While not quite directly about the issue you 
have, Paul Boag a UK web designer has spoken and blogged on how clear policies 
relying on quantitative measures can help establish clear policies and 
(perhaps!) take some of the emotion out of decision making - e.g. see 
http://boagworld.com/business-strategy/website-animal/ - perhaps a similar 
approach might help here as well.

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 18 Apr 2014, at 15:15, Erin White  wrote:

> Develop a brief content and design style guide, then have it approved by
> your leadership team and share it with your organization. (Easier
> said than done, I know.) Bonus points if you work with your (typically)
> print-focused communications person to develop this guide and get his/her
> buy-in on creating content for the web.
> 
> A style guide sets expectations across the board and helps you when you
> need to play they heavy. As you need, you can e-mail folks with a link to
> the style guide, ask them to revise, and offer assistance or suggestions if
> they want.
> 
> Folks are grumpy about this at first, but generally appreciate the overall
> strategy to make the website more consistent and professional-looking. It
> ain't the wild wild west anymore - our web content is both functional and
> part of an overall communications strategy, and we need to treat it
> accordingly.
> 
> --
> Erin White
> Web Systems Librarian, VCU Libraries
> 804-827-3552 | erwh...@vcu.edu | www.library.vcu.edu
> 
> 
> On Fri, Apr 18, 2014 at 9:39 AM, Pikas, Christina K. <
> christina.pi...@jhuapl.edu> wrote:
> 
>> Laughing and feeling your pain... we have a communications person (that's
>> her job) who keeps using bold, italics, h1, in pink (yes pink), randomly in
>> pages... luckily she only does internal pages, and not external.
>> 
>> You could schedule some writing for the web sessions, but I don't know
>> that it will help. You could remove any text formatting... In the end, you
>> probably should just do as I do: close the page, breathe deeply, get up and
>> take a walk, and get on with other things.
>> 
>> Christina
>> 
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@listserv.nd.edu] On Behalf Of
>> Simon LeFranc
>> Sent: Thursday, April 17, 2014 7:43 PM
>> To: CODE4LIB@listserv.nd.edu
>> Subject: [CODE4LIB] distributed responsibility for web content
>> 
>> My organization has recently adopted an enterprise Content Management
>> System. For the first time, staff across 8 divisions became web authors,
>> given responsibility for their division's web pages. Training on the
>> software, which has a WYSIWYG interface for editing, is available and with
>> practice, all are capable of mastering the basic tools. Some simple style
>> decisions were made for them, however, it is extremely difficult to get
>> these folks not to elaborate on or improvise new styles.  Examples:
>> 
>>making text red or another color in the belief that color will draw
>> readers' attentionmaking text bold and/or italic and/or the size of a
>> war-is-declared headline (see 1);using images that are too small to be
>> effectiveadding a few more images that are too small to be effective
>> attempting to emphasize statements using ! or !! or !writing in a
>> too-informal tone ("Come on in outta the rain!") [We are a research
>> organization and museum.]feeling compelled to ornament pages with
>> clipart, curlicues, et al.centering everything
>> There is no one person in the organization with the time or authority to
>> act as editorial overseer. What are some techniques for ensuring that the
>> site maintains a clean, professional appearance?
>> 
>> Simon
>> 
>> 
>>

Re: [CODE4LIB] barriers to open metadata?

2014-04-30 Thread Owen Stephens

Hi Laura,

I've done some work on this in the UK[1][2] and there have been a number of 
associated projects looking at the open release of library, archive and museum 
metadata[3].

For libraries (it is different of archives and museums) I think I'd sum up the 
reasons in three ways - in order of how commonly I think they apply

a. Ignorance/lack of thought - libraries don't tend to licence their metadata, 
and often make no statement about how it can be used - my experience is that 
often no-one has even asked the questions about licencing/data release
b. No business case - in the UK we talked to a group of university librarians 
and found that they didn't see a compelling business case for making open data 
releases of their catalogue records
c. Concern about breaking contractual agreements or impinging on 3rd party 
copyright over records. The Comet project at the University of Cambridge did a 
lot of work in this area[4]

As Roy notes, there have been some significant changes recently with OCLC and 
many national libraries releasing data under open licences. However, while this 
changes (c) it doesn't impact so much on (a) and (b) - so these remain as 
fundamental issues and I have a (unsubstantiated) concern that big data 
releases lead to libraries taking less interest ("someone else is doing this 
for us") rather than taking advantage of the clarity and openess these big data 
releases and associated announcements bring.

A final point - looking at libraries behaviour in relation to 
institutional/open access repositories, where you'd expect at least (a) to be 
considered, unfortunately when I looked a couple of years ago I found similar 
issues. Working for the CORE project at the Open University[5] I found that 
OpenDOAR[6] listed "Metadata re-use policy explicitly undefined" for 57 out of 
125 UK repositories with OAI-PMH services. Only 18 repositories were listed as 
permitting commerical re-use of metadata. Hopefully this has improved in the 
intervening 2 years!

Hope some of this is helpful

Owen

1 Jisc Guide to Open Bibliographic Data http://obd.jisc.ac.uk
2 Jisc Discovery principles http://discovery.ac.uk/businesscase/principles/
3 Jisc Discovery Case studies http://guidance.discovery.ac.uk
4 COMET  http://cul-comet.blogspot.co.uk/p/ownership-of-marc-21-records.html
5 CORE blog http://core-project.kmi.open.ac.uk/node/32
6 OpenDOAR http://www.opendoar.org/

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 29 Apr 2014, at 21:06, Ben Companjen  wrote:

> Hi Laura,
> 
> Here are some reasons I may have overheard.
> 
> Stuck halfway: "We have an OAI-PMH endpoint, so we're open, right?"
> 
> Lack of funding for sorting out our own rights: "We gathered metadata from
> various sources and integrated the result - we even call ourselves Open
> L*y - but we [don't have manpower to figure out what we can do with
> it, so we added a disclaimer]."
> 
> Cultural: "We're not sure how to prevent losing the records' provenance
> after we released our metadata."
> 
> 
> Groeten van Ben
> 
> On 29-04-14 19:02, "Laura Krier"  wrote:
> 
>> Hi Code4Libbers,
>> 
>> I'd like to find out from as many people as are interested what barriers
>> you feel exist right now to you releasing your library's bibliographic
>> metadata openly. I'm curious about all kinds of barriers: technical,
>> political, financial, cultural. Even if it seems obvious, I'd like to hear
>> about it.
>> 
>> Thanks in advance for your feedback! You can send it to me privately if
>> you'd prefer.
>> 
>> Laura
>> 
>> -- 
>> Laura Krier
>> 
>> laurapants.com<http://laurapants.com/?utm_source=email_sig&utm_medium=emai
>> l&utm_campaign=email>

Re: [CODE4LIB] Any good "introduction to SPARQL" workshops out there?

2014-05-01 Thread Owen Stephens

I contributed to a session like this in the UK aimed at cataloguers/metadata 
librarians 
http://www.cilip.org.uk/cataloguing-and-indexing-group/events/linked-data-what-cataloguers-need-know-cig-event.
All the slide decks used are available at 
http://www.cilip.org.uk/cataloguing-and-indexing-group/linked-data-what-cataloguers-need-know
Specifically my introduction to SPARQL slides are at 
http://www.slideshare.net/ostephens/selecting-with-sparql-using-british-national-bibliography-as,
 and link to various example SPARQL queries that can be run on the BNB SPARQL 
endpoint (SPARQL examples are all Gists at https://gist.github.com/ostephens)

Not sure about the practicalities of bringing this to staff in the US, although 
planning is in progress for another event in the UK along the same lines and 
I'd be happy to put you in touch with the relevant people on the committee to 
see if there is any possibility of having it webcast if there was interest.

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 1 May 2014, at 17:23, Hutt, Arwen  wrote:

> We're interested in an introduction to SPARQL workshop for a smallish group 
> of staff.  Specifically an introduction for fairly tech comfortable 
> non-programmers (in our case metadata librarians), as well as a refresher for 
> programmers who aren't using it regularly.
> 
> Ideally (depending on cost) we'd like to bring the workshop to our staff, 
> since it'll allow more people to attend, but any recommendations for good 
> introductory workshops or tutorials would be welcome!
> 
> Thanks!
> Arwen
> 
> 
> Arwen Hutt
> Head, Digital Object Metadata Management Unit
> Metadata Services, Geisel Library
> University of California, San Diego
>

Re: [CODE4LIB] Is ISNI / ISO 27729:2012 a name identifier or an entity identifier?

2014-06-19 Thread Owen Stephens

An aside but interesting to see how some of this identity stuff seems to be 
playing out in the wild now. Google for Catherine Sefton:

https://www.google.co.uk/search?q=catherine+sefton

The Knowledge Graph displays information about Martin Waddell. Catherine Sefton 
is a pseudonym of Martin Waddell. It is impossible to know, but the most likely 
source of this knowledge is Wikipedia which includes the ISNI for Catherine 
Sefton in the Wikipeda page for Martin Waddell 
(http://en.wikipedia.org/wiki/Martin_Waddell) (although oddly not the ISNI for 
Martin Waddell under his own name).

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 18 Jun 2014, at 23:28, Stuart Yeates  wrote:

> My reading of that suggests that 
> http://isni-url.oclc.nl/isni/000122816316 shouldn't have both "Bell, 
> Currer" and "Brontë, Charlotte", which it clearly does...
> 
> Is this is a case of one of our sources of truth doesn't distinguish betweens 
> identities and entities and we're allowing it to pollute our data?
> 
> If that source of truth is wikipedia, we can fix that.
> 
> cheers
> stuart
> 
> On 06/19/2014 12:11 AM, Richard Wallis wrote:
>> Hi all,
>> 
>> Seeing this thread I checked with the ISNI team and got the following
>> answer from Janifer Gatenby who asked me to post it on her behalf:
>> 
>> SNI identifies “public identities”.The scope as stated in the standard
>> is
>> 
>> 
>> 
>> “This International Standard specifies the International Standard name
>> identif*i*er (ISNI) for the identification of public identities of parties;
>> that is, the identities used publicly by parties involved throughout the
>> media content industries in the creation, production, management, and
>> content distribution chains.”
>> 
>> 
>> 
>> The relevant definitions are:
>> 
>> 
>> 
>> *3.1*
>> 
>> *party*
>> 
>> natural person or legal person, whether or not incorporated, or a group of
>> either
>> 
>> *3.3*
>> 
>> *public identity*
>> 
>> Identity of a *party *(3.1) or a fictional character that is or was
>> presented to the public
>> 
>> *3.4*
>> 
>> *name*
>> 
>> character string by which a *public identity *(3.3) is or was commonly
>> referenced
>> 
>> 
>> 
>> A party may have multiple public identities and a public identity may have
>> multiple names (e.g. pseudonyms)
>> 
>> 
>> 
>> ISNI data is available as linked data.  There are currently 8 million ISNIs
>> assigned and 16 million links.
>> 
>> 
>> 
>> Example:
>> 
>> 
>> 
>> [image: ]
>> 
>> ~Richard.
>> 
>> 
>> On 16 June 2014 10:54, Ben Companjen  wrote:
>> 
>>> Hi Stuart,
>>> 
>>> I don't have a copy of the official standard, but from the documents on
>>> the ISNI website I remember that there are name variations and 'public
>>> identities' (as the lemma on Wikipedia also uses). I'm not sure where the
>>> borderline is or who decides when different names are different identities.
>>> 
>>> If it were up to me: pseudonyms are definitely different public
>>> identities, name changes after marriage probably not, name change after
>>> gender change could mean a different public identity. Different public
>>> identities get different ISNIs; the ISNI organisation says the ISNI system
>>> can keep track of connected public identities.
>>> 
>>> Discussions about name variations or aliases are not new, of course. I
>>> remember the discussions about 'aliases' vs 'Artist Name Variations' that
>>> are/were happening on Discogs.com, e.g. 'is J Dilla an alias or a ANV of
>>> Jay Dee?' It appears the users on Discogs finally went with aliases, but
>>> VIAF put the names/identities together: http://viaf.org/viaf/32244000 -
>>> and there is no ISNI (yet).
>>> 
>>> It gets more confusing when you look at Washington Irving who had several
>>> pseudonyms: they are just listed under one ISNI. Maybe because he is dead,
>>> or because all other databases already know and connected the pseudonyms
>>> to the birth name? (I just sent a comment asking about the record at
>>> http://isni.org/isni/000121370797 )
>>> 
>>> 
>>> [Here goes the reference list…]
>>> 
>>> Hope this helps :)
>>> 
>>> Groeten van Ben
&

[CODE4LIB] 'automation' tools

2014-07-04 Thread Owen Stephens

I'm doing a workshop in the UK at a library tech unconference-style event (Pi 
and Mash http://piandmash.info) on automating computer based tasks.
I want to cover tools that are usable by non-programmers and that would work in 
a typical library environment. The types of tools I'm thinking of are:

MacroExpress
AutoHotKey
iMacros for Firefox

While I'm hoping workshop attendees will bring ideas about tasks they would 
like to automate the type of thing I have in mind are things like:

Filling out a set of standard data on a GUI or Web form (e.g. standard set of 
budget codes for an order)
Processing a list of item barcodes from a spreadsheet and doing something with 
them on the library system (e.g. change loan status, check for holds)
Similarly for User IDs
Navigating to a web page and doing some task 

Clearly some of these tasks would be better automated with appropriate APIs and 
scripts, but I want to try to introduce those without programming skills to 
some of the concepts and tools and essentially how they can work around 
problems themselves to some extent.

What tools do you use for this kind of automation task, and what kind of tasks 
do they best deal with?

Thanks,

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

Re: [CODE4LIB] 'automation' tools

2014-07-07 Thread Owen Stephens

Thanks Riley and Andrew for these pointers - some great stuff in there

Other tools and examples still very welcome :)

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 4 Jul 2014, at 15:04, Andrew Weidner  wrote:

> Great idea for a workshop, Owen.
> 
> My staff and I use AutoHotkey every day. We have some apps for data
> cleaning in the CONTENTdm Project Client that I presented on recently:
> http://scholarcommons.sc.edu/cdmusers/cdmusersMay2014/May2014/13/. I'll be
> talking about those in more detail at the Upper Midwest Digital Collections
> Conference <http://www.wils.org/news-events/wilsevents/umdcc/> if anyone is
> interested.
> 
> I did an in-house training session for our ILS and database management
> folks on a simple AHK app that they now use for repetitive data entry:
> https://github.com/metaweidner/AutoType. When I was working with digital
> newspapers I developed a suite of tools for making repetitive quality
> review tasks easier: https://github.com/drewhop/AutoHotkey/wiki/NDNP_QR
> 
> Basic AHK scripts are really great for text wrangling. Just yesterday I
> wrote a script to grab some values from a spreadsheet, remove commas from
> the numbers, and dump them into a tab delimited file in the format that we
> need. That script will become part of our regular workflow. Wrote another
> one-off script to transform labels on our wiki into links. It wrapped the
> labels in the wiki link syntax, and then I copied and pasted the unique
> URLs into the appropriate spots.
> 
> It's also useful for keeping things organized. I have a set of scripts that
> open up frequently used network drive folders and applications, and I
> packaged them as drop down menu choices in a little GUI that's always open
> on the desktop. We have a few search scripts that either grab values from a
> spreadsheet or input box and then run a search for those terms in a web
> database (e.g. id.loc.gov).
> 
> You might check out Selenium IDE for working with web forms:
> http://docs.seleniumhq.org/projects/ide/. The recording feature makes it
> really easy to get started with as an automation tool. I've used it
> extensively for automated metadata editing:
> http://digital.library.unt.edu/ark:/67531/metadc86138/m1/1/
> 
> Cheers!
> 
> Andrew
> 
> 
> On Fri, Jul 4, 2014 at 6:54 AM, Riley Childs  wrote:
> 
>> Don't forget AutoIT (auto IT, pretty clever eh?)
>> http://www.autoitscript.com/site/autoit/
>> 
>> Riley Childs
>> Student
>> Asst. Head of IT Services
>> Charlotte United Christian Academy
>> (704) 497-2086
>> RileyChilds.net
>> Sent from my Windows Phone, please excuse mistakes
>> 
>> -Original Message-
>> From: "Owen Stephens" 
>> Sent: ‎7/‎4/‎2014 4:55 AM
>> To: "CODE4LIB@LISTSERV.ND.EDU" 
>> Subject: [CODE4LIB] 'automation' tools
>> 
>> I'm doing a workshop in the UK at a library tech unconference-style event
>> (Pi and Mash http://piandmash.info) on automating computer based tasks.
>> I want to cover tools that are usable by non-programmers and that would
>> work in a typical library environment. The types of tools I'm thinking of
>> are:
>> 
>> MacroExpress
>> AutoHotKey
>> iMacros for Firefox
>> 
>> While I'm hoping workshop attendees will bring ideas about tasks they
>> would like to automate the type of thing I have in mind are things like:
>> 
>> Filling out a set of standard data on a GUI or Web form (e.g. standard set
>> of budget codes for an order)
>> Processing a list of item barcodes from a spreadsheet and doing something
>> with them on the library system (e.g. change loan status, check for holds)
>> Similarly for User IDs
>> Navigating to a web page and doing some task
>> 
>> Clearly some of these tasks would be better automated with appropriate
>> APIs and scripts, but I want to try to introduce those without programming
>> skills to some of the concepts and tools and essentially how they can work
>> around problems themselves to some extent.
>> 
>> What tools do you use for this kind of automation task, and what kind of
>> tasks do they best deal with?
>> 
>> Thanks,
>> 
>> Owen
>> 
>> Owen Stephens
>> Owen Stephens Consulting
>> Web: http://www.ostephens.com
>> Email: o...@ostephens.com
>> Telephone: 0121 288 6936
>>

Re: [CODE4LIB] coders who library? [was: Let me shadow you, librarians who code!]

2014-07-07 Thread Owen Stephens

I'm a librarian, and a slightly poor excuse for a coder second. I've always 
focussed on the IT/tech side of librarianship in my career and did at one point 
cross from libraries into more general IT management - then firmly put myself 
back into libraries. To a certain extent I left library employment to freelance 
as a consultant to get out of the academic library career path that kept taking 
me into management - which I realised, after several years doing it, was just 
not what got me out of bed in the morning.

There is a name for people without an MLS who can still quote MARC subfields or 
write MODS XML freehand. http://shambrarian.org :)


Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 7 Jul 2014, at 15:36, Miles Fidelman  wrote:

> This recent spate of message leads me to wonder: How many folks here who 
> "code for libraries" have a library science degree/background, vs. folks who 
> come from other backgrounds?  What about folks who end up in technology 
> management/direction positions for libraries?
> 
> Personally: Computer scientist and systems engineer, did some early 
> Internet-in-public library deployments, got to write a book about it.  Not 
> actively doing library related work at the moment.
> 
> Miles Fidelman
> 
> 
> Dot Porter wrote:
>> I'm a medieval manuscripts curator who codes, in Philadelphia, and I'd be
>> happy to talk to you as well.
>> 
>> Dot
>> 
>> 
>> On Tue, Jul 1, 2014 at 10:30 AM, David Mayo  wrote:
>> 
>>> If you'd like to talk to someone who did a library degree, and currently
>>> works as a web developer supporting an academic library, I'd be happy to
>>> talk with you.
>>> 
>>> - Dave Mayo
>>>   Software Engineer @ Harvard > HUIT > LTS
>>> 
>>> 
>>> On Tue, Jul 1, 2014 at 10:12 AM, Steven Anderson <
>>> stevencander...@hotmail.com> wrote:
>>> 
>>>> Jennie,
>>>> As with others, I'm not a librarian as I lack a library degree, but I do
>>>> Digital Repository Development for the Boston Public Library
>>> (specifically:
>>>> https://www.digitalcommonwealth.org/). Feel free to let me know you want
>>>> to chat for your masters paper.
>>>> Sincerely,Steven AndersonWeb Services - Digital Library Repository
>>>> developer617-859-2393sander...@bpl.org
>>>> 
>>>>> Date: Tue, 1 Jul 2014 13:51:07 +
>>>>> From: mschofi...@nova.edu
>>>>> Subject: Re: [CODE4LIB] Let me shadow you, librarians who code!
>>>>> To: CODE4LIB@LISTSERV.ND.EDU
>>>>> 
>>>>> Hey Jennie,
>>>>> 
>>>>> I'm waaay south of MA but I'm pretty addicted to talking about coding
>>> as
>>>> a library job O_o. If you are still in want of guinea-pigs, I'd love to
>>>> skype / hangout.
>>>>> Michael Schofield
>>>>> // mschofi...@nova.edu
>>>>> 
>>>>> -Original Message-
>>>>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
>>> Of
>>>> Jennie Rose Halperin
>>>>> Sent: Monday, June 30, 2014 3:58 PM
>>>>> To: CODE4LIB@LISTSERV.ND.EDU
>>>>> Subject: [CODE4LIB] Let me shadow you, librarians who code!
>>>>> 
>>>>> hey Code4Lib,
>>>>> 
>>>>> Do you work in a library and also like coding?  Do you do coding as
>>> part
>>>> of your job?
>>>>> I'm writing my masters paper for the University of North Carolina at
>>>> Chapel Hill and I'd like to shadow and interview up to 10 librarians and
>>>> archivists who also work with code in some way in the Boston area for the
>>>> next two weeks.
>>>>> I'd come by and chat for about 2 hours, and the whole thing will not
>>>> take up too much of your time.
>>>>> Not in Massachusetts?  Want to skype? Let me know and that would be
>>>> possible.
>>>>> I know that this list has a pretty big North American presence, but I
>>>> will be in Berlin beginning July 14, and could potentially shadow anyone
>>> in
>>>> Germany as well.
>>>>> Best,
>>>>> 
>>>>> Jennie Rose Halperin
>>>>> jennie.halpe...@gmail.com
>>>> 
>> 
>> 
> 
> 
> -- 
> In theory, there is no difference between theory and practice.
> In practice, there is.    Yogi Berra

Re: [CODE4LIB] 'automation' tools

2014-07-07 Thread Owen Stephens

Thanks again all,

I love OpenRefine - I've been working on the GOKb project (http://gokb.org) 
where K-Int (a UK based company) have developed an extension for OpenRefine 
which helps with the cleaning of data about electronic resources (esp. 
journals) from publishers and then pushes it into the GOKb database. The 
extension is fully integrated into the GOKb database but if anyone wants a look 
code is at https://github.com/k-int/gokb-phase1/tree/dev/refine. The extension 
checks the data and reports errors as well as offering ways of fixing common 
issues - there's more on the wiki 
https://wiki.kuali.org/display/OLE/OpenRefine+How-Tos

I did pitch an OpenRefine workshop for the same event as a 'data 
wrangling/cleaning' tool but the 'automation' session got the vote in the end - 
although there is definitely overlap. However I am delivering an OpenRefine 
workshop at the British Library next week - and great to see it is getting used 
across libraries.

The Google Doc Spreadsheets is also a great tip - I've run a course at the 
British Library which uses this to introduce the concept of APIs to 
non-techies. I blogged the original tutorial at 
http://www.meanboyfriend.com/overdue_ideas/2013/02/introduction-to-apis/ but a 
change to the BL open data platform means this no longer works :((

Thanks all again - I'll be trying to put stuff from the automation workshop 
online at some point and I'll post here when there is something up.

Best wishes,

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 8 Jul 2014, at 03:52, davesgonechina  wrote:

> +1 to OpenRefine. Some extensions, like RDF Refine <http://refine.deri.ie/>,
> currently only work with the old Google Refine (still available here
> <https://code.google.com/p/google-refine/>). There's a good deal of
> interesting projects for OpenRefine on GitHub and GitHub Gist.
> 
> Google Docs Spreadsheets also has a surprising amount of functionality,
> such as importXML if you're willing to get your hands dirty with regular
> expressions.
> 
> Dave
> 
> 
> On Tue, Jul 8, 2014 at 3:12 AM, Tillman, Ruth K. (GSFC-272.0)[CADENCE GROUP
> ASSOC]  wrote:
> 
>> Definite cosign on Open Refine. It's intuitive and spreadsheet-like enough
>> that a lot of people can understand it. You can do anything from
>> standardizing state names you get from a patron form to normalizing
>> metadata keywords for a database, so I think it'd be useful even for
>> non-techies.
>> 
>> Ruth Kitchin Tillman
>> Metadata Librarian, Cadence Group
>> NASA Goddard Space Flight Center Library, Code 272
>> Greenbelt, MD 20771
>> Goddard Library Repository: http://gsfcir.gsfc.nasa.gov/
>> 301.286.6246
>> 
>> 
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>> Terry Brady
>> Sent: Monday, July 07, 2014 1:35 PM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] 'automation' tools
>> 
>> I learned about Open Refine <http://openrefine.org/> at the Code4Lib
>> conference, and it looks like it would be a great tool for normalizing
>> data.  I worked on a few projects in the past in which this would have been
>> very helpful.
>>

[CODE4LIB] Automation tools - session at the "Pi and Mash" unconference

2014-08-11 Thread Owen Stephens

Dear all,

A month or so ago I asked for recommendations for automation tools that people 
used in libraries to help inform a session I was going to run. The unconference 
event (Pi and Mash) ran this weekend, and I just wanted to share the materials 
I wrote for the session in case they are of any help. The materials consist of 
a slidedeck called "Automated Love Presentation" (available as Keynote, 
Powerpoint and PDF) and some examples and exercises you can work through in a 
document called "Automated Love Examples" (available as Pages, Word doc, PDF 
and ePub). There are also two accompanying files 'ISBNs.xlsx' and 'isbns.csv' 
which are used in the examples/exercises.

All materials are available at http://bit.ly/automatedlovefolder

Thanks to all who made suggestions which contributed towards the session.

Best wishes,

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

Re: [CODE4LIB] Automated searching of Copac/Worldcat

2014-08-13 Thread Owen Stephens

The worksheets I circulated earlier in the week include examples of how to take 
a list of ISBNs from a spreadsheet/csv file and search on Worldcat (see the 
'Automated Love Examples' docs in http://bit.ly/automatedlovefolder)
What these examples don't do is include how to check the outcome of the search 
automatically are record that.

I think it would be relatively easy to add to the iMacros example to extract a 
hit count / no hits message and write this to a file using the iMacros SAVEAS 
command but I haven't tried this. For a 'no results' option you'd want to look 
for the presence/extract the contents of a div with id=div-results-none
For a results count you'd want to to look for the contents of a table within 
the div with class=resultsinfo

Alternatively you could look at the Selenium IDE extension for Firefox which is 
more complex but allows more sophisticated approach to checking and writing out 
information about text present/absent in web pages retrieved.

Hope that is of some help

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 13 Aug 2014, at 11:20, Nicholas Brown  wrote:

> Apologies for cross posting
> 
> Dear collective wisdom,
> 
> I'm interested in using automation software such as Macro Express or iMacros 
> to feed a list of ISBNs from a spreadsheet into Copac or Worldcat and output 
> a list of those that return no matches in the results screen. The idea would 
> be to create a tool that can quickly, although rather roughly, identify rare 
> items in a collection (though obviously this would be limited to items with 
> ISBNs or other unique identifiers). I can write a macro which will 
> sequentially search either catalogue for a list of ISBNs but am struggling 
> with how to have the macro identify items with no matches (I have a vague 
> idea about searching the results screen for the text "Sorry, there are no 
> search results") and to compile them back into a spreadsheet.
> 
> I'd be keen to hear if anyone has attempted something similar, general 
> advice, any potential pitfalls in the method outlined above or suggestions 
> for a better way to achieve the same results. If something useful comes of it 
> I'd be happy to share the results. 
> 
> Many thanks for your help,
> Nick 
> 
> Nicholas Brown
> Library and Information Manager
> nbr...@iniva.org
> +44 (0)20 7749  1125
> www.iniva.org

Re: [CODE4LIB] IFTTT and barcodes

2014-09-11 Thread Owen Stephens

As noted by Tara, when using IFTTT (or similar tools like Bip.io and WappWolf) 
you are limited to the channels/services the tool has already integrated. You 
are also in the position of having to give a third party service access to 
personal information and the ability to read/write certain services.

I was investigating these types of services very briefly for a recent workshop 
and I came across an open source alternative called Huginn which you can run on 
your own server and of course can extend to work with whatever 
services/channels you want. I thought it looked interesting - available from 
https://github.com/cantino/huginn

Overkill for this particular problem but may be of more general interest

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 11 Sep 2014, at 08:21, Sylvain Machefert  wrote:

> Hello,
> maybe that an easier solution, more IFTTT related, would be to develop a 
> Yahoo pipe, using the ISBN & querying the webpac should be easy for Yahoo 
> Pipes, you can then search in the page using xpath or thing like that. Should 
> be easier thant developping a custom script (if you have no development 
> knowledge, ortherwise it should be scripted easily in PHP, python, whatever).
> 
> I haven't used YPipes in a long time but I think it's worth looking at it.
> 
> Sylvain
> 
> 
> Le 10/09/2014 21:48, Ian Walls a écrit :
>> I don't think IFTTT is the right tool, but the basic idea is sound.
>> 
>> With a spot of custom scripting on some server somewhere, one could take in
>> an ISBN, lookup via the III WebPac, assess eligibility conditions, then
>> return yes or no.  Barcode Scanner on Android has the ability to do custom
>> search URLs, so if your yes/no script can accept URL params, then you should
>> be all set.
>> 
>> Barring a script, just a lookup of the MARC record may be possible, and if
>> it was styled in a mobile-friendly manner, perhaps you could quickly glean
>> whether it's okay or not for copy cataloging.
>> 
>> Side question: is there connectivity in the stacks for doing this kind of
>> lookup?  I know in my library, that's not always the case.
>> 
>> 
>> -Ian
>> 
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>> Riley Childs
>> Sent: Wednesday, September 10, 2014 3:31 PM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] IFTTT and barcodes
>> 
>> Webhooks via the WordPress channel?
>> 
>> Riley Childs
>> Senior
>> Charlotte United Christian Academy
>> Library Services Administrator
>> IT Services
>> (704) 497-2086
>> rileychilds.net
>> @rowdychildren
>> 
>> From: Tara Robertson<mailto:trobert...@langara.bc.ca>
>> Sent: ‎9/‎10/‎2014 3:03 PM
>> To: CODE4LIB@LISTSERV.ND.EDU<mailto:CODE4LIB@LISTSERV.ND.EDU>
>> Subject: Re: [CODE4LIB] IFTTT and barcodes
>> 
>> Hi,
>> 
>> I don't think this is possible using IFTTT right now as existing channels
>> don't exist to create a recipe. I'm trying to think of what those channels
>> would be and can't quite...I don't think IFTTT is the best tool for this
>> task.
>> 
>> What ILS are you using? Could you hook a barcode scanner up to a tablet and
>> scan, then check the MARC...nah, that's seeming almost as time consuming as
>> taking it to your desk (depending on how far your desk is).
>> I recall at an Evergreen hackfest that someone was tweaking the web
>> interface for an inventory type exercise, where it would show red or green
>> depending on some condition.
>> 
>> Cheers,
>> Tara
>> 
>> On 10/09/2014 11:52 AM, Harper, Cynthia wrote:
>>> Now that someone has mentioned IFTTT, I'm reading up on it and wonder if
>> it could make this task possible:
>>> One of my tasks is copy cataloging. I'm only authorized to do LC copy,
>> which involves opening the record (already downloaded in the acq process),
>> and checking to see that 490 doesn't exist (I can't handle series), and
>> looking for DLC in the 040 |a and |c.
>>> It's discouraging when I take 10 books back to my desk from the cataloging
>> shelf, and all 10 are not eligible for copy cataloging.
>>> S...  could I take my phone to the cataloging shelf, use IFTTT to scan
>> my ISBN, search in the III Webpac, look at the MARc record and tell me
>> whether it's LC copy?
>>> Empower the frontline workers! :)
>>> 
>>> Cindy Harper
>>> Electronic Services and Serials Librarian Virginia Theological
>>> Seminary
>>> 3737 Seminary Road
>>> Alexandria VA 22304
>>> 703-461-1794
>>> char...@vts.edu
>> 
>> --
>> 
>> Tara Robertson
>> 
>> Accessibility Librarian, CAPER-BC <http://caperbc.ca/> T  604.323.5254 F
>> 604.323.5954 trobert...@langara.bc.ca
>> <mailto:tara%20robertson%20%3ctrobert...@langara.bc.ca%3E>
>> 
>> Langara. <http://www.langara.bc.ca>
>> 
>> 100 West 49th Avenue, Vancouver, BC, V5Y 2Z6

Re: [CODE4LIB] Python or Perl script for reading RDF/XML, Turtle, or N-triples Files

2014-09-30 Thread Owen Stephens

I've not tried using the LCNAF RDF files, and I've not used RDFLib, but a 
couple of things from (a relatively small amount of) experience parsing RDF:

Don't try to parse the RDF/XML, use n-triples instead
As Kyle mentioned, you might want to use command line tools to strip down the 
n-triples to only deal with data you actually want
Rapper and the Redland RDF libraries are a good place to start, and have 
bindings to Perl, PHP, Python and Ruby (http://librdf.org/raptor/rapper.html 
and http://librdf.org). This StackOverflow Q&A might help getting started 
http://stackoverflow.com/questions/5678623/how-to-parse-big-datasets-using-rdflib
If you want to move between RDF formats an alternative to Rapper is 
http://www.l3s.de/~minack/rdf2rdf/ - this succeeded converting a file of 48 
million triples in ttl to ntriples where Rapper failed with an 'out of memory' 
error (once in ntriples, Rapper can be used for further parsing)


Some slightly random advice there, but maybe some of it will be useful!

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 30 Sep 2014, at 15:54, Jeremy Nelson  
wrote:

> Hi Jean,
> I've found rdflib (https://github.com/RDFLib/rdflib) on the Python side 
> exceeding simple to work with and use. For example, to load the current 
> BIBFRAME vocabulary as an RDF graph using a Python shell:
> 
>>> import rdflib
>>> bf_vocab = rdflib.Graph().parse('http://bibframe.org/vocab/')
>>> len(bf_vocab) # Total number of triples
> 1683
>>> set([s for s in bf_vocab]) # A set of all unique subjects in the graph
> 
> 
> This module offers RDF/XML, Turtle, or N-triples support and with various 
> options for retrieving and manipulating the graph's subjects, predicate, and 
> objects. I would advise installing the JSON-LD 
> (https://github.com/RDFLib/rdflib-jsonld) extension as well.
> 
> Jeremy Nelson
> Metadata and Systems Librarian
> Colorado College
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jean 
> Roth
> Sent: Tuesday, September 30, 2014 8:14 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Python or Perl script for reading RDF/XML, Turtle, or 
> N-triples Files
> 
> Thank you so much for the reply.
> 
> I have not investigated the LCNAF data set thoroughly.  However, my 
> default/ideal is to read in all variables from a dataset.  
> 
> So, I was wondering if any one had an example Python or Perl script for 
> reading RDF/XML, Turtle, or N-triples file.  A simple/partial example would 
> be fine.
> 
> Thanks,
> 
> Jean
> 
> On Mon, 29 Sep 2014, Kyle Banerjee wrote:
> 
> KB> The best way to handle them depends on what you want to do. You need 
> KB> to actually download the NAF files rather than countries or other 
> KB> small files as different kinds of data will be organized 
> KB> differently. Just don't try to read multigigabyte files in a text 
> KB> editor :)
> KB> 
> KB> If you start with one of the giant XML files, the first thing you'll 
> KB> probably want to do is extract just the elements that are 
> KB> interesting to you. A short string parsing or SAX routine in your 
> KB> language of choice should let you get the information in a format you 
> like.
> KB> 
> KB> If you download the linked data files and you're interested in 
> KB> actual headings (as opposed to traversing relationships), grep and 
> KB> sed in combination with the join utility are handy for extracting 
> KB> the elements you want and flattening the relationships into 
> KB> something more convenient to work with. But there are plenty of other 
> tools that you could also use.
> KB> 
> KB> If you don't already have a convenient environment to work on, I'm a  
> KB> fan of virtualbox. You can drag and drop things into and out of your 
> KB> regular desktop or even access it directly. That way you can 
> KB> view/manipulate files with the linux utilities without having to 
> KB> deal with a bunch of clunky file transfer operations involving 
> KB> another machine. Very handy for when you have to deal with multigigabyte 
> files.
> KB> 
> KB> kyle
> KB> 
> KB> On Mon, Sep 29, 2014 at 11:19 AM, Jean Roth  wrote:
> KB> 
> KB> > Thank you!  It looks like the files are available as  RDF/XML, 
> KB> > Turtle, or N-triples files.
> KB> >
> KB> > Any examples or suggestions for reading any of these formats?
> KB> >
> KB> > The MARC Countries file is small, 31-79 kb.  I assume a script 
> KB> > that would read a small file like that would at least be a start 
> KB> > for the LCNAF
> KB> >
> KB> >
> KB>

Re: [CODE4LIB] ISSN lists?

2014-10-17 Thread Owen Stephens

It may depend on exactly what you need.

The ISSN Centre offer licensed access to their ISSN portal at a cost 
http://www.issn.org - my experience is that this is pretty comprehensive
The ISSN Centre also offer a download of ISSN-L tables - this is available for 
free (although you have to state what you intend to do with it before you can 
download) - this is just ISSNs (mapped to their ISSN-Ls) but if you don't need 
bibliographic details then it would be a good source
As well as WorldCat you could also try Suncat which offers a z39.50 connection 
http://www.suncat.ac.uk/support/z-target.shtml, but obviously this has the same 
issue as the WorldCat approach
GOKb and KB+ are both initiatives trying to build knowledgebases containing 
many ISSNs with data to be made available under a CC0 declaration. Both of 
these are focussed on describing bundles/packages of journals. GOKb is going to 
be going into preview imminently (http://gokb.org/news) and KB+ already offers 
downloads http://www.kbplus.ac.uk/kbplus/publicExport. KB+ currently has 
details of around 25k journals.
There may also be some largescale open data initiatives that give you a 
reasonably good set of ISSNs. For example the RLUK release of 60m+ records at 
http://www.theeuropeanlibrary.org/tel4/access/data/lod, or the 12million 
records released by Harvard http://openmetadata.lib.harvard.edu/bibdata (both 
CC0)

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 17 Oct 2014, at 03:16, Stuart Yeates  wrote:

> My understanding is that there is no universal ISSN list but that worldcat 
> allows querying of their database by ISSN. 
> 
> Which method of sampling the ISSN namespace is going to cause least pain? 
> http://www.worldcat.org/ISSN/ seems to be the one talked about, but is there 
> another that's less resource intensive? Maybe someone's already exported this 
> data?
> 
> cheers
> stuart
> --
> I have a new phone number: 04 463 5692

Re: [CODE4LIB] Linux distro for librarians

2014-10-21 Thread Owen Stephens

This triggered a memory of a project that was putting together a ready-to-go 
toolset for Digital Humanities - which I then couldn't remember the details of 
- but luckily Twitter was able to remember it for me (thanks to @mackymoo 
https://twitter.com/mackymoo)

The project is DH Box http://dhbox.org which tries to put together an 
environment suitable for DH work. I think that originally this was to be done 
via installation on the user's local machine, but due the challenges of dealing 
the variation in local environment they've now moved to a 'box in the cloud' 
approach (the change of direction is noted at 
http://dhbox.commons.gc.cuny.edu/blog/2014/dh-box-new-friend-new-platform#sthash.27THWR6E.dpbs).
 To be honest I'm not 100% sure where the project is right now, as although it 
looks like not much has been updated since May 2014.

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 21 Oct 2014, at 15:42, Brad Coffield  wrote:
> 
> Is what you're really after is an environment pre-loaded with useful tools
> for various types of librarians? If so, maybe instead of rolling your own
> distro (and all the work and headache that involves, like a second
> full-time job) maybe create software bundles for linux? Have a website
> where you have lists of software by librarian type. Then make it easy for
> linux users to install them (repo's and what not) ((I haven't been active
> in linux for a while))
> 
> Just thinking out loud.
> 
> 
> -- 
> Brad Coffield, MLIS
> Assistant Information and Web Services Librarian
> Saint Francis University
> 814-472-3315
> bcoffi...@francis.edu

Re: [CODE4LIB] MARC reporting engine

2014-11-03 Thread Owen Stephens

The MARC XML seemed to be an archive within an archive - I had to gunzip to get 
innzmetadata.xml then rename to innzmetadata.xml.gz and gunzip again to get the 
actual xml

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 3 Nov 2014, at 22:38, Robert Haschart  wrote:
> 
> I was going to echo Eric Hatcher's recommendation of Solr and SolrMarc, since 
> I'm the creator of SolrMarc.
> It does provide many of the same tools as are described in the toolset you 
> linked to,  but it is designed to write to Solr rather than to a SQL style 
> database.   Solr may or may not be more suitable for your needs then a SQL 
> database.   However I decided to download the data to see whether SolrMarc 
> could handle it.   I started with the MARCXML.gz data, ungzipped it to get a 
> .XML file, but the resulting file causes SolrMarc to blow chunks.   Either 
> I'm missing something or there is something way wrong with that data.The 
> gzipped binary MARC file work fine with the SolrMarc tools.
> 
> Creating a SolrMarc script to extract the 700 fields, plus a bash script to 
> cluster and count them, and sort by frequency took about 20 minutes.
> 
> -Bob Haschart
> 
> 
> On 11/3/2014 3:00 PM, Stuart Yeates wrote:
>> Thank you to all who responded with software suggestions. 
>> https://github.com/ubleipzig/marctools is looking like the most promising 
>> candidate so far. The more I read through the recommendations the more it 
>> dawned on me that I don't want to have to configure yet another java 
>> toolchain (yes I know, that may be personal bias).
>> 
>> Thank you to all who responded about the challenges of authority control in 
>> such collections. I'm aware of these issues. The current project is about 
>> marshalling resources for editors to make informed decisions about rather 
>> than automating the creation of articles, because there is human judgement 
>> involved in the last step I can afford to take a few authority control 
>> 'risks'
>> 
>> cheers
>> stuart
>> 
>> --
>> I have a new phone number: 04 463 5692
>> 
>> 
>> From: Code for Libraries  on behalf of raffaele 
>> messuti
>> Sent: Monday, 3 November 2014 11:39 p.m.
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] MARC reporting engine
>> 
>> Stuart Yeates wrote:
>>> Do any of these have built-in indexing? 800k records isn't going to fit in 
>>> memory and if building my own MARC indexer is 'relatively straightforward' 
>>> then you're a better coder than I am.
>> you could try marcdb[1] from marctools[2]
>> 
>> [1] https://github.com/ubleipzig/marctools#marcdb
>> [2] https://github.com/ubleipzig/marctools
>> 
>> 
>> --
>> raffaele

Re: [CODE4LIB] Stack Overflow

2014-11-04 Thread Owen Stephens

Another option would be a 'code4lib Q&A' site. Becky Yoose set up one for 
Coding/Cataloguing and so can comment on how much effort its been. In terms of 
asking/answering questions the use is clearly low but I think the content that 
is there is (generally) good quality and useful.

I guess the hard part of any project like this is going to be building the 
community around it. The first things that occur to me is how you encourage 
people to ask the question on this new site, rather than via existing methods 
and how do you build enough community activity around housekeeping such as 
noting duplicate questions and merging/closing. The latter might be a nice 
problem to have, but the former is where both the Library / LIS SE and the 
Digital Preservation SE fell down, and libcatcode suffers the same problem - 
just not enough activity to be a go-to destination.

I'm supportive of the idea, but I'd hate to see this go through the pain of the 
SE process only to fail for the same reasons as previous efforts in this area. 
I think we need to think about this underlying problem - but I'm not sure what 
the solution is/solutions are.

Owen


Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 4 Nov 2014, at 15:34, Schulkins, Joe  
> wrote:
> 
> To be honest I absolutely hate the whole reputation and badge system for 
> exactly the reasons you outline, but I can't deny that I do find the family 
> of Stack Exchange sites extremely useful and by comparison Listservs just 
> seem very archaic to me as it's all too easy for a question (and/or its 
> answer) to drop through the cracks of a popular discussion. Are Listservs 
> really the best way to deal with help? I would even prefer a Drupal site...   
> 
> 
> Joseph Schulkins| Systems Librarian| University of Liverpool Library| PO Box 
> 123 | Liverpool L69 3DA | joseph.schulk...@liverpool.ac.uk| T 0151 794 3844 
> 
> Follow us: @LivUniLibrary Like us: LivUniLibrary Visit us: 
> http://www.liv.ac.uk/library 
> Special Collections & Archives blog: http://manuscriptsandmore.liv.ac.uk
> 
> 
> 
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
> Joshua Welker
> Sent: 04 November 2014 14:43
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Stack Overflow
> 
> The concept of a library technology Stack Exchange site as a google-able 
> repository of information sounds great. However, I do have quite a few 
> reservations.
> 
> 1. Stack Exchange sites seem to naturally lead to gatekeeping, snobbishness, 
> and other troll behaviors. The reputation system built into those sites 
> really go to a lot of folks' heads. High-ranking users seem to take pleasure 
> in shutting down questions as off-topic, redundant, etc.
> Argument and one-upmanship are actively promoted--"The previous answer sucks. 
> Here's my better answer! " This  tends to attract certain (often
> male) personalities and to repel certain (often female) personalities.
> This seems very contrary to the direction the Code4Lib community has tried to 
> move in the last few years of being more inclusive and inviting to women 
> instead of just promoting the stereotypical "IT guy" qualities that dominate 
> most IT-related discussions on the Internet. More here:
> 
> http://www.banane.com/2012/06/20/there-are-no-women-on-stackoverflow-or-ar
> e-there/
> http://michael.richter.name/blogs/why-i-no-longer-contribute-to-stackoverf
> low
> 
> 2. Having a Stack Exchange site might fragment the already quite small and 
> nascent library technology community. This might be an unfounded worry, but 
> it's worth consideration. A lot of Q&A takes place on this listserv, and it 
> would be awkward to try to have all this information in both places. That 
> said, searching StackExchange is much easier than searching a listserv.
> 
> 3. I echo your concerns about vendors. Libraries have a culture of protecting 
> vendors from criticism. Sure, we do lots of criticism behind closed doors, 
> but nowhere that leaves an online footprint. Often, our contracts include a 
> clause that we have to keep certain kinds of information private. I don't 
> think this is a very positive aspect of librarian culture, but it is there.
> 
> I think a year or two ago that there was a pretty long discussion on this 
> listserv about creating a Stack Exchange site.
> 
> Josh Welker
> 
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
> Schulkins, Joe
> Sent: Tuesday, November 04, 2014 8:12 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Stack Ove

Re: [CODE4LIB] Stack Overflow

2014-11-04 Thread Owen Stephens

Thanks for that Mark. That's running on 'question2answer' which looks to have a 
reasonable amount of development going on around it 
https://github.com/q2a/question2answer/graphs/contributors (given Becky's 
comments about OSQA which still hold true)

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 4 Nov 2014, at 16:05, Mark A. Matienzo  wrote:
> 
> On Tue, Nov 4, 2014 at 11:00 AM, Owen Stephens  wrote:
> 
>> Another option would be a 'code4lib Q&A' site. Becky Yoose set up one for
>> Coding/Cataloguing and so can comment on how much effort its been. In terms
>> of asking/answering questions the use is clearly low but I think the
>> content that is there is (generally) good quality and useful.
>> 
>> I guess the hard part of any project like this is going to be building the
>> community around it. The first things that occur to me is how you encourage
>> people to ask the question on this new site, rather than via existing
>> methods and how do you build enough community activity around housekeeping
>> such as noting duplicate questions and merging/closing. The latter might be
>> a nice problem to have, but the former is where both the Library / LIS SE
>> and the Digital Preservation SE fell down, and libcatcode suffers the same
>> problem - just not enough activity to be a go-to destination.
> 
> 
> I would add that the Digital Preservation SE has been reinstantiated as
> Digital Preservation Q&A <http://qanda.digipres.org/>, which is organized
> and supported by the Open Planets Foundation and the National Digital
> Stewardship Alliance.
> 
> Mark A. Matienzo 
> Director of Technology, Digital Public Library of America

[CODE4LIB] Automatically updating documentation with screenshots

2015-01-26 Thread Owen Stephens

I work on a web application and when we release a new version there are often 
updates to make to existing user documentation - especially screenshots where 
unrelated changes (e.g. the addition of a new top level menu item) can make 
whole sets of screenshots desirable across all the documentation.

I'm looking at whether we could automate the generation of screenshots somehow 
which has taken me into documentation tools such as Sphinx 
[http://sphinx-doc.org] and Dexy [http://dexy.it]. However, ideally I want 
something simple enough for the application support staff to be able to use.

Anyone done/tried anything like this?

Cheers

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

Re: [CODE4LIB] Automatically updating documentation with screenshots

2015-01-26 Thread Owen Stephens

Thanks all - I'm looking at both Selenium and Casperjs now.

I also came across a plugin for 'Robot Framework' [http://robotframework.org] 
which allows you to grab screenshots (via Selenium) and annotate with notes - 
along the lines that Ross suggested. The plugin is 'Selenium2Screenshots' 
[https://github.com/datakurre/robotframework-selenium2screenshots]

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 26 Jan 2015, at 13:16, Mads Villadsen  wrote:
> 
> I have used casperjs for this purpose. A small script that loads urls at 
> multiple different resolutions/user agents and takes a screenshot of each of 
> them.
> 
> Regards
> 
> -- 
> Mads Villadsen 
> Statsbiblioteket
> It-udvikler

Re: [CODE4LIB] Automatically updating documentation with screenshots

2015-01-26 Thread Owen Stephens

... and further to this I've just found a neat Chrome plugin which will record 
a set of actions/tests as CasperJS script, including screenshots - my first 
impressions are pretty positive - code produced looks pretty clean.

The plugin is called 'Ressurectio' [https://github.com/ebrehault/resurrectio 
<https://github.com/ebrehault/resurrectio>]

Cheers

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 26 Jan 2015, at 13:48, Owen Stephens  wrote:
> 
> Thanks all - I'm looking at both Selenium and Casperjs now.
> 
> I also came across a plugin for 'Robot Framework' [http://robotframework.org 
> <http://robotframework.org/>] which allows you to grab screenshots (via 
> Selenium) and annotate with notes - along the lines that Ross suggested. The 
> plugin is 'Selenium2Screenshots' 
> [https://github.com/datakurre/robotframework-selenium2screenshots 
> <https://github.com/datakurre/robotframework-selenium2screenshots>]
> 
> Owen
> 
> Owen Stephens
> Owen Stephens Consulting
> Web: http://www.ostephens.com <http://www.ostephens.com/>
> Email: o...@ostephens.com <mailto:o...@ostephens.com>
> Telephone: 0121 288 6936
> 
>> On 26 Jan 2015, at 13:16, Mads Villadsen > <mailto:m...@statsbiblioteket.dk>> wrote:
>> 
>> I have used casperjs for this purpose. A small script that loads urls at 
>> multiple different resolutions/user agents and takes a screenshot of each of 
>> them.
>> 
>> Regards
>> 
>> -- 
>> Mads Villadsen mailto:m...@statsbiblioteket.dk>>
>> Statsbiblioteket
>> It-udvikler
>

Re: [CODE4LIB] Code4LibCon video crew thanks

2015-02-17 Thread Owen Stephens

Apologies for a +1 message, but you know... +1 and some

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 13 Feb 2015, at 18:00, Cary Gordon  wrote:
> 
> I want to deeply thank Ashley Blewer, Steven Anderson and Josh Wilson for 
> running the video streaming and capture at Code4LibCon in Portland. Because 
> of you, we had great video in real time (and I got to actually watch the 
> presentations). I also want to again thank Riley Childs, who could not make 
> it this year. Riley moved the bar up last year by putting together our 
> YouTube presence.
> 
> For the second year running, we requested and were not allowed to setup and 
> test the day before, and for the second year running lost part of the opening 
> session. Fortunately, we did capture most of what did not get streamed on 
> Tuesday, and I will put that online next week. There is always next year.
> 
> Thanks,
> 
> Cary

Re: [CODE4LIB] linked data question

2015-02-26 Thread Owen Stephens

I highly recommend Chapter 6 of the Linked Data book which details different 
design approaches for Linked Data applications - sections 6.3  
(http://linkeddatabook.com/editions/1.0/#htoc84) summarises the approaches as:

1. Crawling Pattern
2. On-the-fly dereferencing pattern
3. Query federation pattern

Generally my view would be that (1) and (2) are viable approaches for different 
applications, but that (3) is generally a bad idea (having been through 
federated search before!)

Owen



Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 26 Feb 2015, at 14:40, Eric Lease Morgan  wrote:
> 
> On Feb 25, 2015, at 2:48 PM, Esmé Cowles  wrote:
> 
>>> In the non-techie library world, linked data is being talked about (perhaps 
>>> only in listserv traffic) as if the data (bibliographic data, for instance) 
>>> will reside on remote sites (as a SPARQL endpoint??? We don't know the 
>>> technical implications of that), and be displayed by >> catalog/the centralized inter-national catalog> by calling data from that 
>>> remote site. But the original question was how the data on those remote 
>>> sites would be  - how can I start my search by searching for 
>>> that remote content?  I assume there has to be a database implementation 
>>> that visits that data and pre-indexes it for it to be searchable, and 
>>> therefore the index has to be local (or global a la Google or OCLC or its 
>>> bibliographic-linked-data equivalent). 
>> 
>> I think there are several options for how this works, and different 
>> applications may take different approaches.  The most basic approach would 
>> be to just include the URIs in your local system and retrieve them any time 
>> you wanted to work with them.  But the performance of that would be 
>> terrible, and your application would stop working if it couldn't retrieve 
>> the URIs.
>> 
>> So there are lots of different approaches (which could be combined):
>> 
>> - Retrieve the URIs the first time, and then cache them locally.
>> - Download an entire data dump of the remote vocabulary and host it locally.
>> - Add text fields in parallel to the URIs, so you at least have a label for 
>> it.
>> - Index the data in Solr, Elasticsearch, etc. and use that most of the time, 
>> esp. for read-only operations.
> 
> 
> Yes, exactly. I believe Esmé has articulated the possible solutions well. 
> escowles++  —ELM

Re: [CODE4LIB] eebo

2015-06-05 Thread Owen Stephens

Hi Eric,

I’ve worked with EEBO as part of the Jisc Historical Texts 
(https://historicaltexts.jisc.ac.uk/home) platform - which provides access to 
EEBO and other collections for UK Universities. My work was around the metadata 
and search of metadata and full text and display of results. I was mainly 
looking at metadata but did some digging into the TEI files to see how the 
markup could be used to extract metadata (e.g. presence of illustrations in the 
text).

I was lucky (?!) enough to have access to the MARC records, but I did also do 
some work looking at the metadata included in the TEI files.

If there is anything I can help with I’d be happy to.

 The people who worked with the files in detail were a UK s/w development 
company Knowledge Integration (http://www.k-int.com/) - I can give you a 
contact there if that would be helpful.

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 5 Jun 2015, at 13:10, Eric Lease Morgan  wrote:
> 
> Does anybody here have experience reading the SGML/XML files representing the 
> content of EEBO? 
> 
> I’ve gotten my hands on approximately 24 GB of SGML/XML files representing 
> the content of EEBO (Early English Books Online). This data does not include 
> page images. Instead it includes metadata of various ilks as well as the 
> transcribed full text. I desire to reverse engineer the SGML/XML in order to: 
> 1) provide an alternative search/browse interface to the collection, and 2) 
> support various types of text mining services. 
> 
> While I am making progress against the data, it would be nice to learn of 
> other people’s experience so I do not not re-invent the wheel (too many 
> times). ‘Got ideas?
> 
> —
> Eric Lease Morgan
> University Of Notre Dame

[CODE4LIB] Global Open Knowledgebase APIs

2015-06-08 Thread Owen Stephens

Dear all,

GOKb, the Global Open Knowledgebase, is a community-managed project that aims 
to describe electronic journals and books, publisher packages, and platforms in 
a way that will be familiar to librarians who have worked with electronic 
resources. I’ve been working on the project since it started working with 
others to gather requirements, develop the underlying data models and specify 
functionality for the system.

GOKb opened to ‘public preview’ in January 2015, and you can signup for an 
account and access the service at https://gokb.kuali.org/gokb/ 
<https://gokb.kuali.org/gokb/>

Several hundred ejournal packages, and associated information about the 
ejournal titles, platforms and organisations have been added to the 
knowledgebase over the past few months. Alongside this work of adding content 
we have also opened up APIs to interact with the service.

We are interested in:

* Understanding how people would like to use data from GOKb via APIs (or other 
mechanisms)
* Getting some use of the initial APIs and getting feedback on these 
* Getting feedback on other APIs people would like to see

The current APIs we support are:

The ‘Coreference’ service
The main aim of this API is to provide back a list of identifiers associated 
with a title. The service allows you to provide a journal identifier (such as 
an ISSN) and get back basic information about the journal including title and 
other identifiers associated with the journal (other ISSNs, DOIs, publisher 
identifiers etc.). 

Documentation: https://github.com/k-int/gokb-phase1/wiki/Co-referencing-Detail 
<https://github.com/k-int/gokb-phase1/wiki/Co-referencing-Detail>
Access: https://gokb.kuali.org/gokb/coreference/index 
<https://gokb.kuali.org/gokb/coreference/index>

OAI Interfaces
The main aim of this API is to enable other services to obtain data from GOKb 
on an ongoing basis. Information about ejournal packages, titles and 
organisations can be obtained via this service

Documentation: 
https://github.com/k-int/gokb-phase1/wiki/OAI-Interfaces-for-Synchronization 
<https://github.com/k-int/gokb-phase1/wiki/OAI-Interfaces-for-Synchronization>
Access: http://gokb.kuali.org/gokb/oai <http://gokb.kuali.org/gokb/oai>

Add/Update API
This API supports adding and updating data in GOKb. You can add new, or update 
existing, Organisations and Platforms. You can add additional identifiers to 
Journal titles.

Documentation: 
https://github.com/k-int/gokb-phase1/wiki/Integration---Telling-GOKb-about-new-or-corresponding-resources-and-local-identifiers
 
<https://github.com/k-int/gokb-phase1/wiki/Integration---Telling-GOKb-about-new-or-corresponding-resources-and-local-identifiers>

We also have a SPARQL endpoint available on our test service (which contains 
test data only). The SPARQL endpoint is at http://test-gokb.kuali.org/sparql 
<http://test-gokb.kuali.org/sparql>, and a set of example queries are given at 
https://github.com/k-int/gokb-phase1/wiki/Sample-SPARQL 
<https://github.com/k-int/gokb-phase1/wiki/Sample-SPARQL>

Feedback on any/all of this would be very welcome - either to the list for 
discussion, or directly to me. We want to make sure we can provide useful data 
and services and hope you can help us do this.

Best wishes,

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

Re: [CODE4LIB] eebo [developments]

2015-06-08 Thread Owen Stephens

Great stuff Eric.

I’ve just seen another interesting take based (mainly) on data in the TCP-EEBO 
release 
https://scalablereading.northwestern.edu/2015/06/07/shakespeare-his-contemporaries-shc-released/

It includes mention of MorphAdorner[1] which does some clever stuff around 
tagging parts of speech, spelling variations, lemmata etc. and another tool 
which I hadn’t come across before AnnoLex[2] "for the correction and annotation 
of lexical data in Early Modern texts”.

This paper[3] from Alistair Baron and Andrew Hardie at the University of 
Lancaster in the UK about preparing EEBO-TCP texts for corpus-based analysis 
may also be of interest, and the team at Lancaster have developed a tool called 
VARD which supports pre-processing texts[4]

Owen

[1] http://morphadorner.northwestern.edu
[2] http://annolex.at.northwestern.edu
[3] http://eprints.lancs.ac.uk/60272/1/Baron_Hardie.pdf
[4] http://ucrel.lancs.ac.uk/vard/about/

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 7 Jun 2015, at 18:48, Eric Lease Morgan  wrote:
> 
> Here some of developments with my playing with the EEBO data. 
> 
> I used the repository on Box to get my content, and I mirrored it locally. 
> [1, 2] I then looped through the content using XPath to extract rudimentary 
> metadata, thus creating a “catalog” (index). Along the way I calculated the 
> number of words in each document and saved that as a field of each "record". 
> Being a tab-delimited file, it is trivial to import the catalog into my 
> favorite spreadsheet, database, editor, or statistics program. This allowed 
> me to browse the collection. I then used grep to search my catalog, and save 
> the results to a file. [5] I searched for Richard Baxter. [6, 7, 8]. I then 
> used an R script to graph the numeric data of my search results. Currently, 
> there are only two types: 1) dates, and 2) number of words. [9, 10, 11, 12] 
> From these graphs I can tell that Baxter wrote a lot of relatively short 
> things, and I can easily see when he published many of his works. (He 
> published a lot around 1680 but little in 1665.) I then transformed the 
> search resu!
 lt!
> s into a browsable HTML table. [13] The table has hidden features. (Can you 
> say, “Usability?”) For example, you can click on table headers to sort. This 
> is cool because I want sort things by number of words. (Number of pages 
> doesn’t really tell me anything about length.) There is also a hidden link to 
> the left of each record. Upon clicking on the blank space you can see 
> subjects, publisher, language, and a link to the raw XML. 
> 
> For a good time, I then repeated the process for things Shakespeare and 
> things astronomy. [14, 15] Baxter took me about twelve hours worth of work, 
> not counting the caching of the data. Combined, Shakespeare and astronomy 
> took me less than five minutes. I then got tired.
> 
> My next steps are multi-faceted and presented in the following incomplete 
> unordered list:
> 
>  * create browsable lists - the TEI metadata is clean and
>consistent. The authors and subjects lend themselves very well to
>the creation of browsable lists.
> 
>  * CGI interface - The ability to search via Web interface is
>imperative, and indexing is a prerequisite.
> 
>  * transform into HTML - TEI/XML is cool, but…
> 
>  * create sets - The collection as a whole is very interesting,
>but many scholars will want sub-sets of the collection. I will do
>this sort of work, akin to my work with the HathiTrust. [16]
> 
>  * do text analysis - This is really the whole point. Given the
>full text combined with the inherent functionality of a computer,
>additional analysis and interpretation can be done against the
>corpus or its subsets. This analysis can be based the counting of
>words, the association of themes, parts-of-speech, etc. For
>example, I plan to give each item in the collection a colors,
>“big” names, and “great” ideas coefficient. These are scores
>denoting the use of researcher-defined “themes”. [17, 18, 19] You
>can see how these themes play out against the complete writings
>of “Dead White Men With Three Names”. [20, 21, 22]
> 
> Fun with TEI/XML, text mining, and the definition of librarianship.
> 
> 
> [1] Box - http://bit.ly/1QcvxLP
> [2] mirror - http://dh.crc.nd.edu/sandbox/eebo-tcp/xml/
> [3] xpath script - http://dh.crc.nd.edu/sandbox/eebo-tcp/bin/xml2tab.pl
> [4] catalog (index) - http://dh.crc.nd.edu/sandbox/eebo-tcp/catalog.txt
> [5] search results - http://dh.crc.nd.edu/sandbox/eebo-tcp/baxter/baxter.txt
> [6] Baxter at VIAF - http://viaf.org/viaf/54178741
> [7] Baxter at WorldCat - http://www.worldcat.org/wcidentit

Re: [CODE4LIB] eebo [perfect texts]

2015-06-09 Thread Owen Stephens

And some of the researchers definitely care about this (authority control, high 
quality descriptive metadata). I went to a hack day focussing on the EEBO-TCP 
Phase 1 release (these texts). I mentioned to one of the researchers (not a 
librarian) that I had access to some MARC records which described the works. 
Their immediate response was “Ah - but which MARC records, because they aren’t 
all of the same quality”!

There are good cataloguing records for the works but they have not been made 
available under an open licence alongside the transcribed texts. Probably the 
highest quality records are those in the English Short Title Catalogue (ESTC) 
http://estc.bl.uk.

There have been some great steps forward in the last few years, but I still 
feel libraries need to increase the amount they are doing to publish metadata 
under explicitly open licences.

Owen


Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 8 Jun 2015, at 23:23, Stuart A. Yeates  wrote:
> 
> Another thing that could usefully be done is significantly better authority
> control. Authors, works, geographical places, subjects, etc, etc.
> 
> Good core librarianship stuff that is essentially orthogonal to all the
> other work that appears to be happening.
> 
> cheers
> stuart
> 
> --
> ...let us be heard from red core to black sky
> 
> On Tue, Jun 9, 2015 at 12:42 AM, Eric Lease Morgan  wrote:
> 
>> On Jun 8, 2015, at 7:32 AM, Owen Stephens  wrote:
>> 
>>> I’ve just seen another interesting take based (mainly) on data in the
>> TCP-EEBO release:
>>> 
>>> 
>> https://scalablereading.northwestern.edu/2015/06/07/shakespeare-his-contemporaries-shc-released/
>>> 
>>> It includes mention of MorphAdorner[1] which does some clever stuff
>> around tagging parts of speech, spelling variations, lemmata etc. and
>> another tool which I hadn’t come across before AnnoLex[2] "for the
>> correction and annotation of lexical data in Early Modern texts”.
>>> 
>>> This paper[3] from Alistair Baron and Andrew Hardie at the University of
>> Lancaster in the UK about preparing EEBO-TCP texts for corpus-based
>> analysis may also be of interest, and the team at Lancaster have developed
>> a tool called VARD which supports pre-processing texts[4]
>>> 
>>> [1] http://morphadorner.northwestern.edu
>>> [2] http://annolex.at.northwestern.edu
>>> [3] http://eprints.lancs.ac.uk/60272/1/Baron_Hardie.pdf
>>> [4] http://ucrel.lancs.ac.uk/vard/about/
>> 
>> 
>> All of this is really very interesting. Really. At the same time, there
>> seems to be a WHOLE lot of effort spent on cleaning and normalizing data,
>> and very little done to actually analyze it beyond “close reading”. The
>> final goal of all these interfaces seem to be refined search. Frankly, I
>> don’t need search. And the only community who will want this level of
>> search will be the scholarly scholar. “What about the undergraduate
>> student? What about the just more than casual reader? What about the
>> engineer?” Most people don’t know how or why parts-of-speech are important
>> let alone what a lemma is. Nor do they care. I can find plenty of things. I
>> need (want) analysis. Let’s assume the data is clean — or rather, accept
>> the fact that there is dirty data akin to the dirty data created through
>> OCR and there is nothing a person can do about it — lets see some automated
>> comparisons between texts. Examples might include:
>> 
>>  * this one is longer
>>  * this one is shorter
>>  * this one includes more action
>>  * this one discusses such & such theme more than this one
>>  * so & so theme came and went during a particular time period
>>  * the meaning of this phrase changed over time
>>  * the author’s message of this text is…
>>  * this given play asserts the following facts
>>  * here is a map illustrating where the protagonist went when
>>  * a summary of this text includes…
>>  * this work is fiction
>>  * this work is non-fiction
>>  * this work was probably influenced by…
>> 
>> We don’t need perfect texts before analysis can be done. Sure, perfect
>> texts help, but they are not necessary. Observations and generalization can
>> be made even without perfectly transcribed texts.
>> 
>> —
>> ELM
>>

Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-18 Thread Owen Stephens

It may depend on the format of the PDF, but I’ve used the Scraperwiki Python 
Module ‘pdf2xml’ function to extract text data from PDFs in the past. There is 
a write up (not by me) at 
http://schoolofdata.org/2013/08/16/scraping-pdfs-with-python-and-the-scraperwiki-module/
 
<http://schoolofdata.org/2013/08/16/scraping-pdfs-with-python-and-the-scraperwiki-module/>,
 and an example of how I’ve used it at 
https://github.com/ostephens/british_library_directory_of_library_codes/blob/master/scraper.py
 
<https://github.com/ostephens/british_library_directory_of_library_codes/blob/master/scraper.py>

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 18 Jun 2015, at 17:02, Matt Sherman  wrote:
> 
> Hi Code4Libbers,
> 
> I am working with colleague on a side project which involves some scanned
> bibliographies and making them more web searchable/sortable/browse-able.
> While I am quite familiar with the metadata and organization aspects we
> need, but I am at a bit of a loss on how to automate the process of putting
> the bibliography in a more structured format so that we can avoid going
> through hundreds of pages by hand.  I am pretty sure regular expressions
> are needed, but I have not had an instance where I need to automate
> extracting data from one file type (PDF OCR or text extracted to Word doc)
> and place it into another (either a database or an XML file) with some
> enrichment.  I would appreciate any suggestions for approaches or tools to
> look into.  Thanks for any help/thoughts people can give.
> 
> Matt Sherman

Re: [CODE4LIB] Processing Circ data

2015-08-05 Thread Owen Stephens

Another option might be to use OpenRefine http://openrefine.org - this should 
easily handle 250,000 rows. I find it good for basic data analysis, and there 
are extensions which offer some visualisations (e.g. the VIB BITs extension 
which will plot simple data using d3 
https://www.bits.vib.be/index.php/software-overview/openrefine 
<https://www.bits.vib.be/index.php/software-overview/openrefine>)

I’ve written an introduction to OpenRefine available at 
http://www.meanboyfriend.com/overdue_ideas/2014/11/working-with-data-using-openrefine/
 
<http://www.meanboyfriend.com/overdue_ideas/2014/11/working-with-data-using-openrefine/>

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 5 Aug 2015, at 21:07, Harper, Cynthia  wrote:
> 
> Hi all. What are you using to process circ data for ad-hoc queries.  I 
> usually extract csv or tab-delimited files - one row per item record, with 
> identifying bib record data, then total checkouts over the given time 
> period(s).  I have been importing these into Access then grouping them by bib 
> record. I think that I've reached the limits of scalability for Access for 
> this project now, with 250,000 item records.  Does anyone do this in R?  My 
> other go-to- software for data processing is RapidMiner free version.  Or do 
> you just use MySQL or other SQL database?  I was looking into doing it in R 
> with RSQLite (just read about this and sqldf  
> http://www.r-bloggers.com/make-r-speak-sql-with-sqldf/ ) because I'm sure my 
> IT department will be skeptical of letting me have MySQL on my desktop.  
> (I've moved into a much more users-don't-do-real-computing kind of 
> environment).  I'm rusty enough in R that if anyone will give me some 
> start-off data import code, that would be great.
> 
> Cindy Harper
> E-services and periodicals librarian
> Virginia Theological Seminary
> Bishop Payne Library
> 3737 Seminary Road
> Alexandria VA 22304
> char...@vts.edu<mailto:char...@vts.edu>
> 703-461-1794

Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-17 Thread Owen Stephens

In theory the 1st indicator dictates the protocol used and 4 =HTTP. However, in 
all examples on http://www.loc.gov/marc/bibliographic/bd856.html, despite the 
indicator being used, the protocol part of the URI it is then repeated in the 
$u field.

You can put ‘7’ in the 1st indicator, then use subfield $2 to define other 
methods.

Since only ‘http’ is one of the preset protocols, not https, I guess in theory 
this means you should use something like

856 70 $uhttps://example.com$2https

I’d be pretty surprised if in practice people don’t just do:

856 40 $uhttps://example.com

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 17 Aug 2015, at 21:41, Stuart A. Yeates  wrote:
> 
> I'm in the middle of some work which includes touching the 856s in lots of
> MARC records pointing to websites we control. The websites are available on
> both https://example.org/ and http://example.org/
> 
> Can I put //example.org/ in the MARC or is this contrary to the standard?
> 
> Note that there is a separate question about whether various software
> systems support this, but that's entirely secondary to the question of the
> standard.
> 
> cheers
> stuart
> --
> ...let us be heard from red core to black sky

Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-17 Thread Owen Stephens

Sorry - addressing the actual question, rather than the one in my head, the 856 
field "is also repeated when more than one access method is used” - so my 
reading is you should be doing both:

856 40 $uhttp://example.com
856 70 $uhttps://example.com$2https

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 18 Aug 2015, at 00:00, Owen Stephens  wrote:
> 
> In theory the 1st indicator dictates the protocol used and 4 =HTTP. However, 
> in all examples on http://www.loc.gov/marc/bibliographic/bd856.html, despite 
> the indicator being used, the protocol part of the URI it is then repeated in 
> the $u field.
> 
> You can put ‘7’ in the 1st indicator, then use subfield $2 to define other 
> methods.
> 
> Since only ‘http’ is one of the preset protocols, not https, I guess in 
> theory this means you should use something like
> 
> 856 70 $uhttps://example.com$2https
> 
> I’d be pretty surprised if in practice people don’t just do:
> 
> 856 40 $uhttps://example.com
> 
> Owen
> 
> 
> Owen Stephens
> Owen Stephens Consulting
> Web: http://www.ostephens.com
> Email: o...@ostephens.com
> Telephone: 0121 288 6936
> 
>> On 17 Aug 2015, at 21:41, Stuart A. Yeates  wrote:
>> 
>> I'm in the middle of some work which includes touching the 856s in lots of
>> MARC records pointing to websites we control. The websites are available on
>> both https://example.org/ and http://example.org/
>> 
>> Can I put //example.org/ in the MARC or is this contrary to the standard?
>> 
>> Note that there is a separate question about whether various software
>> systems support this, but that's entirely secondary to the question of the
>> standard.
>> 
>> cheers
>> stuart
>> --
>> ...let us be heard from red core to black sky
>

Re: [CODE4LIB] Job: Wine Loving Developer at University of California, Davis

2015-12-11 Thread Owen Stephens

That may well be true, but ‘getting the job done’ isn’t the only aspect of a 
crowdsourcing project. It can be used to engage an audience more deeply in the 
collection and give them some investment in it. This can help with overall 
visibility of the collection on the web (through those people who have engaged 
sharing what they are doing/seeing etc.), and future use, and be a platform for 
further projects.

A project like this could also offer a way of experimenting with crowdsourcing 
in a low risk way. And of course the developer is needed for the visualisation 
aspect anyway, so the recruitment needs to happen and a wage needs to be paid 
anyway ...

Whether all this balances out against the economics/efficiency of getting the 
job done in the cheapest possible way is a judgement that needs to be made, but 
I don’t think the simple economic argument is the only one in play here.

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 10 Dec 2015, at 23:42, James Morley  wrote:
> 
> I agree with Thomas's logic, if not the maths (surely $2,000?)
> 
> I was going to do a few myself but it looks like comments have been disabled 
> on the Flickr images?
> 
> 
> From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Thomas 
> Krichel [kric...@openlib.org]
> Sent: 10 December 2015 23:17
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Job: Wine Loving Developer  at University of 
> California, Davis
> 
>  j...@code4lib.org writes
> 
> 
>> **PROJECT DETAILS**
>> The UC Davis University Library is launching a project to digitize the
>> [Amerine wine label 
>> collection](https://www.flickr.com/photos/brantley/sets/72
>> 157655817440104/with/21116552632/)
> 
>  Some look like hard to read.
> 
>> and engage the public to transcribe the information contained on the
>> labels and associated annotations.
> 
>  This may take a long time. I suggest rather than doing that, take
>  somebody in a low-income country who speaks French, say, and who will
>  type all the data in. That way you get consistency in the data.  I
>  live in Siberia, I can find somebody there. Once this data is in a
>  simple text file, you can use in-house staff to attach it to the
>  label images in your systems.
> 
>  Crowdsource sounds cool, but for 4000 label it makes no sense.
>  If the typist gets $10/h, and gets 20 labels done in 1h, we
>  are talking $200. The visit you are planning for your developer
>  will cost that much.
> --
> 
>  Cheers,
> 
>  Thomas Krichel  http://openlib.org/home/krichel
>  skype:thomaskrichel

Re: [CODE4LIB] searching metadata vs searching content

2016-01-28 Thread Owen Stephens

To share the practice from a project I work on - the Jisc Historical Texts 
platform[1] which provides searching across digitised texts from the 16th to 
19th centuries. In this case we had the option to build the search application 
from scratch, rather than using a product such as ContentDM etc. I should say 
that all the technical work was done by K-Int [2] and Gooii [3], I was there to 
advise on metadata and user requirements, and so the following is based on my 
understanding of how the system works, and any errors are down to me :)

There are currently three major collections within the Historical Texts 
platform, with different data sources behind each one. In general the data we 
have for each collection consists of MARC metadata records, full text in XML 
documents (either from transcription or from OCR processes) and image files of 
the pages. 

The platform is build using the ElasticSearch [4] (ES) indexing software (as 
with Solr this is built on top of Lucene).

We structure the data we index in ES in two layers - the ‘publication’ record, 
which is essentially where all the MARC metadata lives (although not as MARC - 
we transform this to an internal scheme), and the ‘page’ records - one record 
per page in the item. The text content lives in the page record, along with 
links to the image files for the page. The ‘page’ records are all what ES calls 
‘child’ records of the relevant publication record. We make this relationship 
through shared IDs in the MARC records and the XML fulltext documents.

We create a whole range of indexes from this data. Obviously field specific 
searchs like title or author only search the relevant metadata fields. But we 
also have a (default) ’search all’ option which searches through all the 
metadata and fulltext. If the user wants to search the text only, they check an 
option and we limit the search to only text from records of the ‘page’ type.

The results the user gets initially are always the publication level records - 
so essentially your results list is a list of books. For each result you can 
view ‘matches in text’ which shows snippets of where your search term appears 
in the fulltext. You can then either click to view the whole book, or click the 
relevant page from the list of snippets. When you view the book, the software 
retrieves all the ‘page’ records for the book, and from the page records can 
retrieve the image files. When the user goes to the book viewer, we also carry 
over the search terms from their search, so they can see the same text snippets 
of where the terms appear alongside the book viewer - so the user can navigate 
to the pages which contain the search terms easily.

For more on the ES indexing side of this, Rob Tice from Knowledge Integration 
did a talk about the use of ES in this context at the London Elasticsearch 
usergroup [5]. Unfortunately the interface itself requires a login, but if you 
want to get a feel for how this all works in the UI, there is also a screencast 
which gives an overview of the UI available [6].

Best wishes,

Owen

1. https://historicaltexts.jisc.ac.uk
2. http://www.k-int.com
3. http://www.gooii.com
4. https://www.elastic.co
5. 
http://www.k-int.com/Rob-Tice-Elastic-London-complex-modelling-of-rich-text-data-in-Elasticsearch
6. http://historicaltexts.jisc.ac.uk/support

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 27 Jan 2016, at 00:30, Laura Buchholz  wrote:
> 
> Hi all,
> 
> I'm trying to understand how digital library systems work when there is a
> need to search both metadata and item text content (plain text/full text),
> and when the item is made up of more than one file (so, think a digitized
> multi-page yearbook or newspaper). I'm not looking for answers to a
> specific problem, really, just looking to know what is the current state of
> community practice.
> 
> In our current system (ContentDM), the "full text" of something lives in
> the metadata record, so it is indexed and searched along with the metadata,
> and essentially treated as if it were metadata. (Correct?). This causes
> problems in advanced searching and muddies the relationship between what is
> typically a descriptive metadata record and the file that is associated
> with the record. It doesn't seem like a great model for the average digital
> library. True? I know the answer is "it depends", but humor me... :)
> 
> If it isn't great, and there are better models, what are they? I was taught
> METS in school, and based on that, I'd approach the metadata in a METS or
> METS-like fashion. But I'm unclear on the steps from having a bunch of METS
> records that include descriptive metadata and pointers to text files of the
> OCR (we don't, but if we did...) to indexing and providing results to
> users. I think an

Re: [CODE4LIB] Directories of OAI-PMH repositories

2013-02-08 Thread Owen Stephens

Also see OpenDOAR

http://www.opendoar.org
 
We used this listing when building Core http://core.kmi.open.ac.uk/search - 
which aggregates and does full-text analysis and similarity matching across OA 
repositories

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 7 Feb 2013, at 23:19, Wilhelmina Randtke  wrote:

> Thanks!  The list of lists is very helpful.
> 
> -Wilhelmina Randtke
> 
> On Thu, Feb 7, 2013 at 2:40 PM, Habing, Thomas Gerald
> wrote:
> 
>> Here is a registry of OAI-PMH repositories that we maintain (sporadically)
>> here at Illinois:  http://gita.grainger.uiuc.edu/registry/
>> 
>> Tom
>> 
>>> -Original Message-
>>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>>> Phillips, Mark
>>> Sent: Thursday, February 07, 2013 2:13 PM
>>> To: CODE4LIB@LISTSERV.ND.EDU
>>> Subject: Re: [CODE4LIB] Directories of OAI-PMH repositories
>>> 
>>> You could start here.
>>> 
>>> http://www.openarchives.org/pmh/
>>> 
>>> Mark
>>> 
>>> From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of
>>> Wilhelmina Randtke [rand...@gmail.com]
>>> Sent: Thursday, February 07, 2013 2:03 PM
>>> To: CODE4LIB@LISTSERV.ND.EDU
>>> Subject: [CODE4LIB] Directories of OAI-PMH repositories
>>> 
>>> Is there a central listing of places that track and list OAI-PMH
>> repository
>>> feeds?  I have an OAI-PMH compliant repository, so now am looking for
>>> places to list that so that harvesters or anyone who is interested can
>> find it.
>>> 
>>> -Wilhelmina Randtke
>>

Re: [CODE4LIB] You are a coder. So what am I?

2013-02-13 Thread Owen Stephens

"Shambrarian": Someone who knows enough truth about how libraries really work, 
but not enough to go insane or be qualified as a real librarian. (See more at 
http://m.urbandictionary.com/#define?term=Shambrarian)

More information available at http://shambrarian.org/

And Dave Pattern has published a handy guide to Librarian/Shambrarian 
interactions
("DO NOT bore the librarian by showing them your Roy Tennant Fan Club 
membership card")
http://daveyp.wordpress.com/2011/07/21/librarianshambrarian-venn-diagram/

Tongue firmly in cheek,

Owen 

On 14 Feb 2013, at 00:22, Maccabee Levine  wrote:

> Andromeda's talk this afternoon really struck a chord, as I shared with her
> afterwards, because I have the same issue from the other side of the fence.
> I'm among the 1/3 of the crowd today with a CS degree and and IT
> background (and no MLS).  I've worked in libraries for years, but when I
> have a point to make about how technology can benefit instruction or
> reference or collection development, I generally preface it with "I'm not a
> librarian, but...".  I shouldn't have to be defensive about that.
> 
> Problem is, 'coder' doesn't imply a particular degree -- just the
> experience from doing the task, and as Andromeda said, she and most C4Lers
> definitely are coders.  But 'librarian' *does* imply MLS/MSLS/etc., and I
> respect that.
> 
> What's a library word I can use in the same way as coder?
> 
> Maccabee
> 
> -- 
> Maccabee Levine
> Head of Library Technology Services
> University of Wisconsin Oshkosh
> levi...@uwosh.edu
> 920-424-7332

[CODE4LIB] British Library Directory of Libraries (probably of interest to UK only)

2013-04-23 Thread Owen Stephens

The British Library has a directory of library codes used by UK registered 
users of it's Document Supply service. The Directory of Library Codes enables 
British Library customers to convert into names and addresses the library codes 
they are given in response to location searches. It also indicates each 
library's supply and charging policies. More information at 
http://www.bl.uk/reshelp/atyourdesk/docsupply/help/replycodes/dirlibcodes/

As far as I know the only format this data has ever been made available in is 
PDF. I've always thought this a shame, so I've written a scraper on scraperwiki 
to extract the data from the PDF and make it available as structured, 
query-able, data. The scraper and output is at 
https://scraperwiki.com/scrapers/british_library_directory_of_library_codes/

Just in case anyone would find it useful. Also any suggestions for improving 
the scraper welcome (I don't usually write Python so the code is probably even 
ropier than my normal code :)

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

Re: [CODE4LIB] DOI scraping

2013-05-17 Thread Owen Stephens

I'd say yes to the investment in jQuery generally - not too difficult to get 
the basics if you already use javascript, and makes some things a lot easier

It sounds like you are trying to do something not dissimilar to LibX 
http://libx.org ? (except via bookmarklet rather than as a browser plugin).
Also looking for custom database scrapers it might be worth looking at Zotero 
translators, as they already exist for many major sources and I guess will be 
grabbing the DOI where it exists if they can 
http://www.zotero.org/support/dev/translators

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 17 May 2013, at 05:32, "Fitchett, Deborah"  
wrote:

> Kia ora koutou,
> 
> I’m wanting to create a bookmarklet that will let people on a journal article 
> webpage just click the bookmarklet and get a permalink to that article, 
> including our proxy information so it can be accessed off-campus.
> 
> Once I’ve got a DOI (or other permalink, but I’ll cross that bridge later), 
> the rest is easy. The trouble is getting the DOI. The options seem to be:
> 
> 1.   Require the user to locate and manually highlight the DOI on the 
> page. This is very easy to code, not so easy for the user who may not even 
> know what a DOI is let alone how to find it; and some interfaces make it hard 
> to accurately select (I’m looking at you, ScienceDirect).
> 
> 2.   Live in hope of universal CoiNS implementation. I might be waiting a 
> long time.
> 
> 3.   Work out, for each database we use, how to scrape the relevant 
> information from the page. Harder/tedious to code, but makes it easy for the 
> user.
> 
> I’ve been looking around for existing code that something like #3. So far 
> I’ve found:
> 
> · CiteULike’s bookmarklet (jQuery at http://www.citeulike.org/bm - 
> afaik it’s all rights reserved)
> 
> · AltMetrics’ bookmarklet (jQuery at 
> http://altmetric-bookmarklet.dsci.it/assets/content.js - MIT licensed)
> 
> Can anyone think of anything else I should be looking at for inspiration?
> 
> Also on a more general matter: I have the general level of Javascript that 
> one gets by poking at things and doing small projects and then getting 
> distracted by other things and then coming back some months later for a 
> different small project and having to relearn it all over again. I’ve long 
> had jQuery on my “I guess I’m going to have to learn this someday but, um, 
> today I just wanna stick with what I know” list. So is this the kind of thing 
> where it’s going to be quicker to learn something about jQuery before I get 
> started, or can I just as easily muddle along with my existing limited 
> Javascript? (What really are the pros and cons here?)
> 
> Nāku noa, nā
> 
> Deborah Fitchett
> Digital Access Coordinator
> Library, Teaching and Learning
> 
> p +64 3 423 0358
> e deborah.fitch...@lincoln.ac.nz<mailto:deborah.fitch...@lincoln.ac.nz> | w 
> library.lincoln.ac.nz<http://library.lincoln.ac.nz/>
> 
> Lincoln University, Te Whare Wānaka o Aoraki
> New Zealand's specialist land-based university
> 
> 
> 
> P Please consider the environment before you print this email.
> "The contents of this e-mail (including any attachments) may be confidential 
> and/or subject to copyright. Any unauthorised use, 
> distribution, or copying of the contents is expressly prohibited.  If you 
> have received this e-mail in error, please advise the sender 
> by return e-mail or telephone and then delete this e-mail together with all 
> attachments from your system."
>

Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Owen Stephens

Putting the files on GitHub might be an option - free for public repositories, 
and 38Mb should not be a problem to host there

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 12 Jun 2013, at 02:24, Dana Pearson  wrote:

> I have crosswalked the Project Gutenberg RDF/DC metadata to MARC.  I would
> like to make these files available to any library that is interested.
> 
> I thought that I would put them on my website via FTP but don't know if
> that is the best way.  Don't have an ftp client myself so was thinking that
> that may be now passé.
> 
> I tried using Google Drive with access available via the link to two
> versions of the files, UTF8 and MARC8.  However, it seems that that is not
> a viable solution.  I can access the files with the URLs provided by
> setting the access to anyone with the URL but doesn't work for some of
> those testing it for me or with the links I have on my webpage..
> 
> I have five folders with files of about 38 MB total.  I have separated the
> ebooks, audio books, juvenile content, miscellaneous and non-Latin scripts
> such as Chinese, Modern Greek.  Most of the content is in the ebook folder.
> 
> I would like to make access as easy as possible.
> 
> Google Drive seems to work for me.  Here's the link to my page with the
> links in case you would like to look at the folders.  Works for me but not
> for everyone who's tried it.
> 
> http://dbpearsonmlis.com/ProjectGutenbergMarcRecords.html
> 
> thanks,
> dana
> 
> -- 
> Dana Pearson
> dbpearsonmlis.com

Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-12 Thread Owen Stephens

On 12 Jun 2013, at 14:06, Dana Pearson  wrote:

> Thanks for the replies..I had looked at GitHub but thought it something
> different, ie, collaborative software development...I will look again

Yes - that's the main use (git is version control software, GitHub hosts git 
repositories) - but of course git doesn't care what types of files you have 
under version control. It came to mind because I know it's been used to 
distribute metadata files before - e.g. this set of metadata from the Cooper 
Hewitt National Design Museum https://github.com/cooperhewitt/collection

There could be some additional benefits gained through using git to version 
control this type of file, and GitHub to distribute them if you were 
interested, but it can act as simply a place to put the files and make them 
available for download. But of course the other suggestions would do this 
simpler task just as well.

Owen

Re: [CODE4LIB] best way to make MARC files available to anyone

2013-06-13 Thread Owen Stephens

On 13 Jun 2013, at 02:57, Dana Pearson  wrote:

> quick followup on the thread..
> 
> github:  I looked at the cooperhewitt collection but don't see a way to
> download the content...I could copy and paste their content but that may
> not be the best approach for my files...documentation is thin, seems i
> would have to provide email addresses for those seeking access...but
> clearly that is not the case with how the cooperhewitt archive is
> configured..
> 
> My primary concern has been to make it as simple a process as possible for
> libraries which have limited technical expertise. 

I suspect from what you say that GitHub is not what you want in this case. 
However, I just wanted to clarify that you can download files as a Zip file 
(e.g. for Cooper Hewitt 
https://github.com/cooperhewitt/collection/archive/master.zip), and that this 
link is towards the top left on each screen in GitHub. The repository is a 
public one (which is the default, and only option unless you have a paid 
account on GitHub) and you do not need to provide email addresses or anything 
else to access the files on a public repository

Owen

Re: [CODE4LIB] Anyone have access to well-disambiguated sets of publication data?

2013-07-09 Thread Owen Stephens

I'd echo the other comments that finding reliable data is problematic but as a 
suggestion of reasonably good data you could try:

Names was a Jisc funded project that as far as I know isn't currently active 
but the data available should be of reasonable quality I think. More details on 
the project available at 
http://names.mimas.ac.uk/files/Final_Report_Names_Phase_Two_September_2011.pdf

Names: for author names + identifiers - e.g. 
http://names.mimas.ac.uk/individual/25256.html?&outputfields=identifiers (this 
one has an ISNI)
Names also provides links to Journal articles - e.g. for same person 
http://names.mimas.ac.uk/individual/25256.html?&outputfields=resultpublications
You could then use the Crossref DOI lookup service to get journal identifiers

Not sure this will get you what you need but might be worth a look

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 9 Jul 2013, at 16:32, Paul Albert  wrote:

> I am exploring methods for author disambiguation, and I would like to have 
> access to one or more set of well-disambiguated data set containing:
> – a unique author identifier (email address, institutional identifier)
> – a unique article identifier (PMID, DOI, etc.)
> – a unique journal identifier (ISSN)
> 
> Definition for "well-disambiguated" – for a given set of authors, you know 
> the identity of their journal articles to a precision and recall of greater 
> than 90-95%.
> 
> Any ideas?
> 
> thanks,
> Paul
> 
> 
> Paul Albert
> Project Manager, VIVO
> Weill Cornell Medical Library
> 646.962.2551

Re: [CODE4LIB] Releasing library holdings metadata openly on the web (was: Libraries and IT Innovation)

2013-07-24 Thread Owen Stephens

On the holdings front also see the work being done on a holding ontology at 
https://github.com/dini-ag-kim/holding-ontology (and related mailing list 
http://lists.d-nb.de/mailman/listinfo/dini-ag-kim-bestandsdaten) - discussion 
all in English

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 23 Jul 2013, at 21:14, Dan Scott  wrote:

> Hi Laura:
> 
> On Tue, Jul 23, 2013 at 12:36 PM, Laura Krier  wrote:
> 
> 
> 
>> The area where I'm most involved right now is in releasing library holdings
>> metadata openly on the web, in discoverable and re-usable forms. It's
>> amazing to me that we still don't do this. Imagine the things that could be
>> created by users and software developers if they had access to information
>> about which libraries hold which resources.
> 
> I'm really interested in your efforts on this front, and where this
> work is taking place, as that's what I'm trying to do as part of my
> participation in the W3C Schema Bib Extend Community Group at
> http://www.w3.org/community/schemabibex/
> 
> See the thread starting around
> http://lists.w3.org/Archives/Public/public-schemabibex/2013Jul/0068.html
> where we're trying to work out how best to surface library holdings in
> schema.org structured data, with one effort focusing on reusing the
> "Offer" class. There are many open questions, of course, but one of
> the end goals (at least for me) is to get the holdings into a place
> where regular people are most likely to find them: in search results
> served up by search engines like Google and Bing.
> 
> If you're not involved in the W3C community group, maybe you should
> be! And it would be great if you could point out where your work is
> taking place so that we can combine forces.
> 
> Dan

Re: [CODE4LIB] netflix search mashups w/ library tools?

2013-08-19 Thread Owen Stephens

From the Netflix API Terms of Use "Titles and Title Metadata may be stored for 
no more than twenty four (24) hours."
http://developer.netflix.com/page/Api_terms_of_use

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 19 Aug 2013, at 16:59, Ken Irwin  wrote:

> Thanks Karen,
> 
> This goes in a bit of a direction from what I'm hoping for and your project 
> does suggest that some matching to build such searches might be possible. 
> 
> What I really want is to apply LCSH and related data to the Netflix search 
> process, essentially dropping Netflix holdings into a library catalog 
> interface. I suspect you'd have to build a local cache of the OCLC data for 
> known Netflix items to do so, and maybe a local cache of the Netflix title 
> list. I wonder if either or both of those actions would violate the TOS for 
> the respective services. 
> 
> Ken
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
> Coombs
> Sent: Monday, August 19, 2013 11:26 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] netflix search mashups w/ library tools?
> 
> Ken,
> 
> I did a mashup that took Netflix's top 100 movies and looked to see if a 
> specific library had that item.
> http://www.oclc.org/developer/applications/netflix-my-library
> 
> You might think about doing the following. Search WorldCat for titles on a 
> particular topic and then check to see if the title is available via Netflix. 
> Netflix API for searching their catalog is pretty limited though so it might 
> not give you what you want. It looks like it only allows you to search their 
> streamable content.
> 
> Also I had a lot of trouble with trying to match Netflix titles and library 
> holdings. Because there isn't a good match point. DVDs don't have ISBNs and 
> if you use title you can get into trouble because movies get remade. So title 
> + date seems to work best if you can get the information.
> 
> Karen
> 
> On Mon, Aug 19, 2013 at 8:54 AM, Ken Irwin  wrote:
>> Hi folks,
>> 
>> Is anyone out there using library-like tools for searching Netflix? I'm 
>> imagining a world in which Netflix data gets mashed up with OCLC data or 
>> something like it to populate a more robustly searchable Netflix title list.
>> 
>> Does anything like this exist?
>> 
>> What I really want at the moment is a list of Netflix titles dealing with 
>> Islamic topics (Muhammed, the Qu'ran, the history of Islamic civilizations, 
>> the Hajj, Ramadan, etc.) for doing beyond-the-library readers' advisory in 
>> connection with our ALA/NEH Muslim Journey's Bookshelf. Netflix's own search 
>> tool is singularly awful, and I thought that the library world might have an 
>> interest in doing better.
>> 
>> Any ideas?
>> Thanks
>> Ken

Re: [CODE4LIB] What do you want to learn about linked data?

2013-09-04 Thread Owen Stephens

Just a recommendation for a source of information - I've found 
http://linkeddatabook.com/editions/1.0/ very useful especially in thinking 
about the practicalities of linked data publication and consumption in 
applications

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 4 Sep 2013, at 15:13, "Akerman, Laura"  wrote:

> Karen,
> 
> It's hard to say what "basics" are.  We had a learning group at Emory that 
> covered a lot of the "what is it", including mostly what you've listed but 
> also the environment (library and cultural heritage, and larger environment), 
> but we had a harder time getting to the "what do you do with it" which is 
> what would really motivate and empower people to go ahead and get beyond 
> basics.
> 
> Maybe add:
> 
> How do you embed linked data in web pages using RDFa
> (Difference between RDFa and schema.org/other microdata)
> How do you harvest linked data from web pages, endpoints, or other modes of 
> delivery?
> Different serializations and how to convert
> How do you establish relations between different "vocabularies" (classes and 
> properties) using RDFS and OWL?
> (Demo) New answers to your questions enabled by combining and querying linked 
> data!
> 
> Maybe a step toward "what can you do with it" would be to show (or have an 
> exercise):
> 
> How can a web application interface with linked data?
> 
> I suspect there are a lot of people who've read about it and/or have had 
> tutorials here and there, and who really want to get their hands in it.  
> That's where there's a real dearth of training.
> 
> An "intermediate level" workshop addressing (but not necessarily answering!) 
> questions like:
> 
> Do you need a triplestore or will a relational database do?
> Do you need to store your data as RDF or can you do everything you need with 
> XML or some other format, converting on the way out or in?
> Should you query external endpoints in real time in your application, or 
> cache the data?
> Other than SPARQL, how do you "search" linked data?  Indexing strategies...  
> tools...
> If asserting  OWL "sameAs" is too dangerous in your context, what other 
> strategies for expressing "close to it" relationships between resources 
> (concepts) might work for you?
> Advanced SPARQL using regular expressions, CREATE, etc.
> Care and feeding of triplestores (persistence, memory, )
> Costing out linked data applications:
>   How much additional server space and bandwidth will I (my institution) need 
> to provision in order to work with this stuff?
>   Open source, "free", vs. commercial management systems?
> Backward conversion -transformations from linked data to other data 
> serializations (e.g. metadata standards in XML).
> What else?
> 
> Unfortunately (or maybe just, how it is) no one has built an interface that 
> hides all the programming and technical details from people but lets them 
> experience/experiment with this stuff (have they?).  So some knowledge is 
> necessary.  What are prerequisites and how could we make the burden of 
> knowing them not so onerous to people who don't have much experience in web 
> programming or system administration, so they could get value from a 
> tutorial,?
> 
> Laura
> 
> Laura Akerman
> Technology and Metadata Librarian
> Room 208, Robert W. Woodruff Library
> Emory University, Atlanta, Ga. 30322
> (404) 727-6888
> lib...@emory.edu
> 
> 
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
> Coyle
> Sent: Wednesday, September 04, 2013 4:59 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] What do you want to learn about linked data?
> 
> All,
> 
> I had a few off-list requests for basics - what are the basic things that 
> librarians need to know about linked data? I have a site where I am putting 
> up a somewhat crudely designed tutorial (with exercises):
> 
> http://kcoyle.net/metadata/
> 
> As you can see, it is incomplete, but I work away on it when so inspired. It 
> includes what I consider to be the basic knowledge:
> 
> 1. What is metadata?
> 2. Data vs. text
> 3. Identifiers (esp. URIs)
> 4. Statements (not records) (read: triples) 5. Semantic Web basics 6. URIs 
> (more in depth) 7. Ontologies 8. Vocabularies
> 
> I intend to link various slide sets to this, and anyone is welcome to make 
> use of the content there. It would be GREAT for it to become an actual 
> tutorial, perhaps using better software, but I ha

Re: [CODE4LIB] Open Source ERM

2013-09-20 Thread Owen Stephens

I'm involved in the GOKb project, and also a related project in the UK called 
'KB+' which is a national service providing a knowledgebase and the ability to 
manage subscriptions/licences.
As Adam said - GOKb is definitely more of a service, although the software 
could be run by anyone it isn't designed with ERM functionality in mind - but 
to be able to be a GOKb is a community managed knowledgebase - and so far much 
of the work has been to build a set of tools for bringing in data from 
publishers and content providers, and to store and manage that data. In the not 
too distant future GOKb will provide data via APIs for use in downstream 
systems.

Two specific downstream systems GOKb is going to be working with are the Kuali 
OLE system (https://www.kuali.org/ole) and the KB+ system mentioned above. KB+ 
started with very similar ideas to GOKb in terms of building a community 
managed knowledgebase, but with the UK HE community specifically in mind. 
However it is clear that collaborating with GOKb will have significant benefits 
and help the community focus its effort in a single knowledgebase, and so it is 
expected that eventually KB+ will consume data from GOKb, and the community 
will contribute to the data managed in GOKb.

However KB+ also provides more ERM style functionality available to UK 
Universities. Each institution can setup its own subscriptions and licenses, 
drawing on the shared knowledgebase information which is managed centrally by a 
team at Jisc Collections (who negotiate licenses for much of the content in the 
UK, among other things). I think the KB+ software could work as a standalone 
ERMs in terms of functionality, but its strength is as a multi-institution 
system with a shared knowledgebase. We are releasing v3.3 next week which 
brings integration with various discussion forum software - hoping we can put 
community discussion and collaboration at the heart of the product

Development on both KB+ and GOKb is being done by a UK software house called 
Knowledge Integration, and while licenses for the respective code bases have 
not yet been implemented, both should be released under an open licence in the 
future. However the code is already on Github if anyone is interested
http://github.com/k-int/KBPlus/
https://github.com/k-int/gokb-phase1

In both cases they are web apps written in Groovy. GOKb has the added 
complication/interest of also having a Open (was Google) Refine extension as 
this is the tool chose for loading messing e-journal data into the system

Sorry to go on, hope the above is of some interest

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 20 Sep 2013, at 16:26, Karl Holten  wrote:

> A couple of months ago our organization began looking at new ERM solutions / 
> link resolvers, so I thought I'd share my thoughts based on my research of 
> the topic. Unfortunately, I think this is one area where open source 
> offerings are a bit thin. Many offerings look promising at first but are no 
> longer under development. I'd be careful about adopting something that's no 
> longer supported. Out of all the options that are no longer developed, I 
> thought the CUFTS/GODOT combo was the most promising. Out of the options that 
> seem to still be under development, there were two options that stood out: 
> CORAL and GOKb. Neither includes a link resolver, so they weren't good for 
> our needs. CORAL has the advantage of being out on the market right now. GOKb 
> is backed by some pretty big institutions and looks more sophisticated, but 
> other than some slideshows there's not a lot to look at to actually evaluate 
> it at the moment. 
> 
> Ultimately, I came to the conclusion that nothing out there right now matches 
> the proprietary software, especially in terms of link resolvers and in terms 
> of a knowledge base. If I were forced to go open source I'd say the GOKb and 
> CORAL look the most promising. Hope that helps narrow things down at least a 
> little bit.
> 
> Regards,
> Karl Holten
> Systems Integration Specialist
> SWITCH Consortium
> 6801 North Yates Road
> Milwaukee, WI 53217
> http://topcat.switchinc.org/ 
> 
> 
> 
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
> Riesner, Giles W.
> Sent: Thursday, September 19, 2013 5:33 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Open Source ERM
> 
> Thank you, Peter.  I took a quick look at the list and found ERMes there as 
> well as a few others.
> Not everything under this category really fits what I'm looking for (e.g. 
> Calibre). I'll look a little deeper.
> 
> Regards,
> 
> 
> Giles W. Riesner, Jr., Lead Library Technician, Library Technolo

Re: [CODE4LIB] Library of Congress

2013-10-01 Thread Owen Stephens

+1

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 1 Oct 2013, at 14:21, "Doran, Michael D"  wrote:

>> As far as I can tell the LOC is up and the offices are closed. HORRAY!!
>> Let's celebrate!
> 
> Before we start celebrating, let's consider our friends and colleagues at the 
> LOC (some of who are code4lib people) who aren't able to work and aren't 
> getting paid starting today.
> 
> -- Michael
> 
> # Michael Doran, Systems Librarian
> # University of Texas at Arlington
> # 817-272-5326 office
> # 817-688-1926 mobile
> # do...@uta.edu
> # http://rocky.uta.edu/doran/
> 
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>> Riley Childs
>> Sent: Tuesday, October 01, 2013 5:28 AM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: [CODE4LIB] Library of Congress
>> 
>> As far as I can tell the LOC is up and the offices are closed. HORRAY!!
>> Let's celebrate!
>> 
>> Riley Childs
>> Junior and Library Tech Manager
>> Charlotte United Christian Academy
>> +1 (704) 497-2086
>> Sent from my iPhone
>> Please excuse mistakes

Re: [CODE4LIB] usability question: searching for a database (not in a database)

2010-08-01 Thread Owen Stephens

Agree with others about user testing, but from my experience it is better to 
get the application to react intelligently to what us typed in than to expect 
to control what a user is going to enter. 

Type-ahead suggestions may help, but  I'm a fan of adding a bit of intelligence 
to the search app - if they type in something that finds a hit in your database 
a-z, promote those in your results screen - perhaps 'featured results' above 
federated search results etc. 

Also alongside usability testing, keep looking at what is actually being 
searched via the log files, and adjust over time as necessary. 

Owen

On 30 Jul 2010, at 13:22, Sarah Weeks  wrote:

> Long time lurker, first time poster.
> I have a little usability question I was hoping someone could give me advice
> on.
> I'm updating the databases page on our website and we'd like to add a search
> box that would search certain fields we have set up for our databases
> (title, vendor, etc...) so that even if someone doesn't remember the first
> word in the title, they can quickly find the database they're looking
> through without having to scroll through the whole A-Z list.
> My question is: if we add a search box to our main database page, how can we
> make it clear that it's for searching FOR a database and not IN a database?
> Some of the choices we've considered are:
> Seach for a database:
> Search this list:
> Don't remember the name of the database? Search here:
> 
> I'm not feeling convinced by any of them. I'm afraid when people see a
> search box they're not going to bother reading the text but will just assume
> it's a federated search tool.
> 
> Any advice?
> 
> -Sarah Beth
> 
> -- 
> Sarah Beth Weeks
> Interim Head Librarian of Technical Services and Systems
> St Olaf College Rolvaag Memorial Library
> 1510 St. Olaf Avenue
> Northfield, MN 55057
> 507-786-3453 (office)
> 717-504-0182 (cell)

[CODE4LIB] Linking Sakai 'Citation Helper' to other systems

2010-08-23 Thread Owen Stephens

I'm part of a project at Oxford University in the UK that is looking at how we 
can enhance the 'Citation Helper' module in Sakai (which is used to provide the 
Oxford learning environment 'WebLearn')  - enabling faculty members to add 
resources from the Oxford 'resource discovery' solution SOLO (Primo, from Ex 
Libris) and displaying holdings/availability information alongside items in the 
resource lists.

I've just blogged some more information about the project (from my own point of 
view), outlining the approach we are taking. We are aiming to achieve the 
integrations through a 'loosely coupled' approach making use of common 
standards/specifications including:

OpenURL
COinS
Juice framework
DLF-ILS GetAvailability
DAIA (possibly)

All this should be mean that what we do at Oxford can be easily transferred to 
other environments. There is a lot more detail in the blog post at 
http://www.meanboyfriend.com/overdue_ideas/2010/08/sir-louie/, and I'd really 
welcome comments/suggestions/issues/questions to inform the project as we start 
developing the solutions for Sir Louie.

Thanks,

Owen


Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

Re: [CODE4LIB] Linking Sakai 'Citation Helper' to other systems

2010-08-24 Thread Owen Stephens

Thanks Karen - yes, I'm following the work of the group :)

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 23 Aug 2010, at 16:54, Karen Coombs wrote:

> Owen,
> 
> It is probably worth looking at the work of ILS Interoperability group
> which is has adopted the XC NCIP Toolkit to try to provide access to
> ILS data like Availability. The group is very active right now and
> trying to get people to build connectors for the toolkit and improve
> existing code. If you are dealing with ALEPH I believe there is a
> connector for ALEPH in the current version of the XC NCIP toolkit.
> 
> You can check out our activities at http://groups.google.com/group/ils-di
> 
> Karen
> 
> On Mon, Aug 23, 2010 at 6:48 AM, Owen Stephens  wrote:
>> I'm part of a project at Oxford University in the UK that is looking at how 
>> we can enhance the 'Citation Helper' module in Sakai (which is used to 
>> provide the Oxford learning environment 'WebLearn')  - enabling faculty 
>> members to add resources from the Oxford 'resource discovery' solution SOLO 
>> (Primo, from Ex Libris) and displaying holdings/availability information 
>> alongside items in the resource lists.
>> 
>> I've just blogged some more information about the project (from my own point 
>> of view), outlining the approach we are taking. We are aiming to achieve the 
>> integrations through a 'loosely coupled' approach making use of common 
>> standards/specifications including:
>> 
>> OpenURL
>> COinS
>> Juice framework
>> DLF-ILS GetAvailability
>> DAIA (possibly)
>> 
>> All this should be mean that what we do at Oxford can be easily transferred 
>> to other environments. There is a lot more detail in the blog post at 
>> http://www.meanboyfriend.com/overdue_ideas/2010/08/sir-louie/, and I'd 
>> really welcome comments/suggestions/issues/questions to inform the project 
>> as we start developing the solutions for Sir Louie.
>> 
>> Thanks,
>> 
>> Owen
>> 
>> 
>> Owen Stephens
>> Owen Stephens Consulting
>> Web: http://www.ostephens.com
>> Email: o...@ostephens.com
>> Telephone: 0121 288 6936
>>

[CODE4LIB] Help with DLF-ILS GetAvailability

2010-10-20 Thread Owen Stephens

I'm working with the University of Oxford to look at integrating some library 
services into their VLE/Learning Management System (Sakai). One of the services 
is something that will give availability for items on a reading list in the VLE 
(the Sakai 'Citation Helper'), and I'm looking at the DLF-ILS GetAvailability 
specification to achieve this.

For physical items, the availability information I was hoping to use is 
expressed at the level of a physical collection. For example, if several 
college libraries within the University I have aggregated information that 
tells me the availability of the item in each of the college libraries. 
However, I don't have item level information. 

I can see how I can use simpleavailability to say over the entire institution 
whether (e.g.) a book is available or not. However, I'm not clear I can express 
this in a more granular way (say availability on a library by library basis) 
except by going to item level. Also although it seems you can express multiple 
locations in simpleavailability, and multiple availabilitymsg, there is no way 
I can see to link these, so although I could list each location OK, I can't 
attach an availabilitymsg to a specific location (unless I only express one 
location). 

Am I missing something, or is my interpretation correct? 

Any other suggestions? 

Thanks, 

Owen 

PS also looked at DAIA which I like, but this (as far as I can tell) only 
allows availabitlity to be specified at the level of items


Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

Re: [CODE4LIB] Help with DLF-ILS GetAvailability

2010-10-21 Thread Owen Stephens

Sorry Jonathan - meant to say thanks - and that your blog posts were already
my 'required reading' for doing anything with ils-di stuff!

Owen

On Wed, Oct 20, 2010 at 8:35 PM, Jonathan Rochkind  wrote:

> I believe you are correct.  The ils-di stuff is just kind of a framework
> starting point, not (yet) a complete end-to-end standards-constrained
> solution.
>
> I believe you will find my thoughts and experiences on this issue helpful.
>  My own circumstances did not involve collection-level anything, but I still
> ended up using an unholy mish-hash of several abused metadata formats to
> express what I needed.
>
> http://bibwild.wordpress.com/2009/09/10/dlf-ils-di-dlfexpanded-service-for-horizon/
>
>
> http://bibwild.wordpress.com/2009/07/31/exposing-holdings-in-dlf-ils-di-standard-format-web-service/
>
>
>
>
> Owen Stephens wrote:
>
>> I'm working with the University of Oxford to look at integrating some
>> library services into their VLE/Learning Management System (Sakai). One of
>> the services is something that will give availability for items on a reading
>> list in the VLE (the Sakai 'Citation Helper'), and I'm looking at the
>> DLF-ILS GetAvailability specification to achieve this.
>>
>> For physical items, the availability information I was hoping to use is
>> expressed at the level of a physical collection. For example, if several
>> college libraries within the University I have aggregated information that
>> tells me the availability of the item in each of the college libraries.
>> However, I don't have item level information.
>> I can see how I can use simpleavailability to say over the entire
>> institution whether (e.g.) a book is available or not. However, I'm not
>> clear I can express this in a more granular way (say availability on a
>> library by library basis) except by going to item level. Also although it
>> seems you can express multiple locations in simpleavailability, and multiple
>> availabilitymsg, there is no way I can see to link these, so although I
>> could list each location OK, I can't attach an availabilitymsg to a specific
>> location (unless I only express one location).
>> Am I missing something, or is my interpretation correct?
>> Any other suggestions?
>> Thanks,
>> Owen
>> PS also looked at DAIA which I like, but this (as far as I can tell) only
>> allows availabitlity to be specified at the level of items
>>
>>
>> Owen Stephens
>> Owen Stephens Consulting
>> Web: http://www.ostephens.com
>> Email: o...@ostephens.com
>> Telephone: 0121 288 6936
>>
>>
>>
>


-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] Help with DLF-ILS GetAvailability

2010-10-21 Thread Owen Stephens

Thanks Dave,

Yes - my reading was that dlf:holdings was for pure 'holdings' as opposed to
'availability'. We could put the simpleavailability in there I guess but as
you say since we are controlling both ends then there doesn't seem any point
in abusing it like that. The downside is we'd hoped to do something that
could be taken by other sites - the original plan was to use the Juice
framework - developed by Talis using jQuery to parse a standard availability
format so that this could then be applied easily in other environments.
Obviously we can still achieve the outcome we need for the immediate
requirements of the project by using a custom format.

Thanks again

Owen


On Thu, Oct 21, 2010 at 4:28 PM, Walker, David  wrote:

> Hey Owen,
>
> Seems like the you could use the  element to hold this kind
> of individual library information.
>
> The DLF-ILS documentation doesn't seem to think that you would use
> dlf:simpleavailability here, though, but rather MARC or ISO holdings
> schemas.
>
> But if you're controlling both ends of the communication, I don't know if
> it really matters.
>
> --Dave
>
> ==
> David Walker
> Library Web Services Manager
> California State University
> http://xerxes.calstate.edu
> ________
> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Owen
> Stephens [o...@ostephens.com]
> Sent: Wednesday, October 20, 2010 12:22 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Help with DLF-ILS GetAvailability
>
> I'm working with the University of Oxford to look at integrating some
> library services into their VLE/Learning Management System (Sakai). One of
> the services is something that will give availability for items on a reading
> list in the VLE (the Sakai 'Citation Helper'), and I'm looking at the
> DLF-ILS GetAvailability specification to achieve this.
>
> For physical items, the availability information I was hoping to use is
> expressed at the level of a physical collection. For example, if several
> college libraries within the University I have aggregated information that
> tells me the availability of the item in each of the college libraries.
> However, I don't have item level information.
>
> I can see how I can use simpleavailability to say over the entire
> institution whether (e.g.) a book is available or not. However, I'm not
> clear I can express this in a more granular way (say availability on a
> library by library basis) except by going to item level. Also although it
> seems you can express multiple locations in simpleavailability, and multiple
> availabilitymsg, there is no way I can see to link these, so although I
> could list each location OK, I can't attach an availabilitymsg to a specific
> location (unless I only express one location).
>
> Am I missing something, or is my interpretation correct?
>
> Any other suggestions?
>
> Thanks,
>
> Owen
>
> PS also looked at DAIA which I like, but this (as far as I can tell) only
> allows availabitlity to be specified at the level of items
>
>
> Owen Stephens
> Owen Stephens Consulting
> Web: http://www.ostephens.com
> Email: o...@ostephens.com
> Telephone: 0121 288 6936
>



-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] Help with DLF-ILS GetAvailability

2010-10-21 Thread Owen Stephens

OK - thanks both will pursue this - taking on board Jonathan's points on the 
issues around this

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 21 Oct 2010, at 22:07, Walker, David wrote:

>> Yes - my reading was that dlf:holdings was for pure 'holdings' 
>> as opposed to 'availability'.
> 
> I would agree with Jonathan that putting a summary of item availability in 
>  is not an abuse.
> 
> For example, ISO Holdings -- one of the schemas the DLF-ILS documents 
> suggests using here -- has elements for things like:
> 
>  
>
>  
> 
> Very much the kind of summary information you are using.  Those are different 
> from it's  element, which describes individual 
> items.
> 
> So IMO it wouldn't be (much of) a stretch to express this in 
> dlf:simpleavailability instead.
> 
> --Dave
> 
> ==
> David Walker
> Library Web Services Manager
> California State University
> http://xerxes.calstate.edu
> 
> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Jonathan 
> Rochkind [rochk...@jhu.edu]
> Sent: Thursday, October 21, 2010 1:26 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Help with DLF-ILS GetAvailability
> 
> I don't think that's an abuse.  I consider  to be for
> information about a "holdingset", or some collection of "items", while
>  is for information about an individual item.
> 
> I think regardless of what you do you are being over-optimistic in
> thinking that if you just "do dlf", your stuff will interchangeable with
> any other clients or servers "doing dlf". The spec is way too open-ended
> for that, it leaves a whole bunch of details not specified and up to the
> implementer.  For better or worse. I made more comments about this in
> the blog post I referenced earlier.
> 
> Jonathan
> 
> Owen Stephens wrote:
>> Thanks Dave,
>> 
>> Yes - my reading was that dlf:holdings was for pure 'holdings' as opposed to
>> 'availability'. We could put the simpleavailability in there I guess but as
>> you say since we are controlling both ends then there doesn't seem any point
>> in abusing it like that. The downside is we'd hoped to do something that
>> could be taken by other sites - the original plan was to use the Juice
>> framework - developed by Talis using jQuery to parse a standard availability
>> format so that this could then be applied easily in other environments.
>> Obviously we can still achieve the outcome we need for the immediate
>> requirements of the project by using a custom format.
>> 
>> Thanks again
>> 
>> Owen
>> 
>> 
>> On Thu, Oct 21, 2010 at 4:28 PM, Walker, David  wrote:
>> 
>> 
>>> Hey Owen,
>>> 
>>> Seems like the you could use the  element to hold this kind
>>> of individual library information.
>>> 
>>> The DLF-ILS documentation doesn't seem to think that you would use
>>> dlf:simpleavailability here, though, but rather MARC or ISO holdings
>>> schemas.
>>> 
>>> But if you're controlling both ends of the communication, I don't know if
>>> it really matters.
>>> 
>>> --Dave
>>> 
>>> ==
>>> David Walker
>>> Library Web Services Manager
>>> California State University
>>> http://xerxes.calstate.edu
>>> 
>>> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Owen
>>> Stephens [o...@ostephens.com]
>>> Sent: Wednesday, October 20, 2010 12:22 PM
>>> To: CODE4LIB@LISTSERV.ND.EDU
>>> Subject: [CODE4LIB] Help with DLF-ILS GetAvailability
>>> 
>>> I'm working with the University of Oxford to look at integrating some
>>> library services into their VLE/Learning Management System (Sakai). One of
>>> the services is something that will give availability for items on a reading
>>> list in the VLE (the Sakai 'Citation Helper'), and I'm looking at the
>>> DLF-ILS GetAvailability specification to achieve this.
>>> 
>>> For physical items, the availability information I was hoping to use is
>>> expressed at the level of a physical collection. For example, if several
>>> college libraries within the University I have aggregated information that
>>> tells me the availability of the item in each of the college librar

[CODE4LIB] Open Edge - Open Source in Libraries event

2010-12-16 Thread Owen Stephens

Is there a better place to celebrate Burn's Night than Edinburgh? This could be 
just the excuse you were looking for...

Open Edge - Open Source in Libraries
This two day event on open source software for libraries is being run in 
collaboration with JISC and SCONUL. The first day is ’Haggis and Mash’, a 
Mashed Library event, while the second day covers broader issues, in particular 
how capacity might be built to enable open source solutions to flourish in HE 
and FE Libraries.

Mashed Library (http://www.mashedlibrary.com/) is an informal network of 
Library professionals who are interested in how technology can be used to 
enhance library services increasing the ease of access to library data. ’Haggis 
and Mash' is a semi-unconference event which is designed to showcase some of 
best practice from library staff from around the UK, combined with a practical 
element to let delegates come together and brainstorm/develop practical 
solutions for mashing existing library data. Haggis and Mash will have a 
particular focus on the use of Open Source library software, including 
presentations and hands-on workshops covering systems such as Evergreen, VuFind 
and Blacklight, as well as other Open Source projects like Juice - for a full 
programme see http://mashedlibrary.com/wiki/index.php?title=Haggis_and_Mash

This first day is intended for anyone with an interest in the use of technology 
in libraries, and although sessions will have technical content, the event aims 
to offer something to anyone with an interest in technology & libraries - from 
beginners to experienced programmers.
The second day of the event has a broader focus for people with a strategic 
role in HE and FE Libraries and IT, as well as Managers and Practitioners. The 
day will cover four themes:

THEME ONE: Why employ OSS library solutions ( the key issues) There are a 
number of reports on the overall benefits of OSS. This session will summarise 
and analyse the benefits and some challenges.
THEME TWO What are the OSS solutions for libraries?
(a) summary of what is available: inc vertical search, ERM, APIs, Widgets, IRs 
VLE, Digital preservation
Look at some of the solutions in more detail with a focus on the benefits 
rather than details of features
THEME THREE: What capacity do we need for OSS to flourish in libraries?
THEME FOUR: How can we develop that capacity?
For further information about Open Source Library Technology visit 
http://helibtech.com/Open+Source

Hope some of you can make it

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

[CODE4LIB] Invitation to the Opening Data – Opening Doors Workshop, Manchester (UK), 18th April

2011-03-04 Thread Owen Stephens

*How can we gain bigger audiences for our scholarly and cultural resources
and enhance services for researchers, teachers and learners?*

In 2010, the JISC and RLUK Resource Discovery Taskforce (RDTF), involving
national stakeholders from libraries, archives and museums, set out a vision
for making the most of UK scholarly and cultural resources.  JISC and their
RDTF partners have now committed to a programme of activity to help fulfil
the vision – building critical mass through opening up data, exploring and
demonstrating what open data makes possible, and actively sharing learning
points with the wider community.


*The ‘Opening Data – Opening Doors’ event marks the starting point of this
journey. *

**

*We are looking for developers/tech-interested people to contribute to this
event - tell us:*

*
*


   - *

   How you can use data describing these resources - what innovative
   services or products could be delivered?
   *
   - *

   What things can be done in terms of format/licensing/apis to make
   exploiting this data as easy as possible?
   *
   - *

   Do you have data you can contribute? Are there any barriers to
   contributing data (technical or other), and how could these be overcome?
   *
   - *

   What excites/would excite you about this attempt to open up
   scholarly/cultural resources and enhance services?
   *

(you can get a flavour of what is already happening from this newsletter
http://rdtf.mimas.ac.uk/newsletter/rdtfnewsletter01-march2011.pdf)


Come to the event to:

· Hear from services that are opening up their data including what’s
happening in the new RDTF projects that have just been commissioned

· Help to shape the messages, advice and support offered during the
2011 programme and beyond

· Help to develop practical and engaging approaches to exploiting
our data



Venue: Malmaison Manchester, Piccadilly, Manchester, M1 1LZ

Date: Monday 18th April 2011, 10.00am to 4.00pm



*Who should attend?* Managers, practitioners, developers and advocates from
libraries, archives, museums, associated publishers and interested
organisations who want early involvement in clarifying, expanding and
challenging the realities of exposing, sharing and exploiting the resource
description data held by our institutions.



*Register at:* http://rdtf-opening-doors.eventbrite.com/



There is already a lot happening.  To find out more download a copy of the
first RDTF newsletter at:
http://rdtf.mimas.ac.uk/newsletter/rdtfnewsletter01-march2011.pdf



For more information on the JISC and RLUK Resource Discovery Taskforce,
visit: http://rdtf.mimas.ac.uk

-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] MARC magic for file

2011-04-01 Thread Owen Stephens

"I'm sure any decent MARC tool can deal with them, since decent MARC tools
are certainly going to be forgiving enough to deal with four characters that
apparently don't even really matter."

You say that, but I'm pretty sure Marc4J throws errors MARC records where
these characters are incorrect

Owen

On Fri, Apr 1, 2011 at 3:51 AM, William Denton  wrote:

> On 28 March 2011, Ford, Kevin wrote:
>
>  I couldn't get Simon's MARC 21 Magic file to work.  Among other issues, I
>> received "line too long" errors.  But, since I've been curious about this
>> for sometime, I figured I'd take a whack at it myself.  Try this:
>>
>
> This is very nice!  Thanks.  I tried it on a bunch of MARC files I have,
> and it recognized almost all of them.  A few it didn't, so I had a closer
> look, and they're invalid.
>
> For example, the Internet Archive's Binghamton catalogue dump:
>
> http://ia600307.us.archive.org/6/items/marc_binghamton_univ/
>
> $ file -m marc.magic bgm*mrc
> bgm_openlib_final_0-5.mrc: data
> bgm_openlib_final_10-15.mrc:   MARC Bibliographic
> bgm_openlib_final_15-18.mrc:   data
> bgm_openlib_final_5-10.mrc:MARC Bibliographic
>
> But why?  Aha:
>
> $ head -c 25 bgm_openlib_final_*mrc
> ==> bgm_openlib_final_0-5.mrc <==
> 01812cas  2200457   45x00
> ==> bgm_openlib_final_10-15.mrc <==
> 01008nam  2200289ua 45000
> ==> bgm_openlib_final_15-18.mrc <==
> 01614cam00385   45  0
> ==> bgm_openlib_final_5-10.mrc <==
> 00887nam  2200265v  45000
>
> As you say, the leader should end with 4500 (as defined at
> http://www.loc.gov/marc/authority/adleader.html) but two of those files
> don't.  So they're not valid MARC.  I'm sure any decent MARC tool can deal
> with them, since decent MARC tools are certainly going to be forgiving
> enough to deal with four characters that apparently don't even really
> matter.
>
> So on the one hand they're usable MARC but file wouldn't say so, and on the
> other that's a good indication that the files have failed a basic validity
> test.  I wonder if there are similar situations for JPEGs or MP3s.
>
> I think you should definitely submit this for inclusion in the magic file.
> It would be very useful for us all!
>
> Bill
>
> P.S. I'd never used head -c (to show a fixed number of bytes) before.
> Always nice to find a new useful option to an old command.
>
>
>  #
>> # MARC 21 Magic  (Second cut)
>>
>> # Set at position 0
>> 0   short   >0x
>>
>> # leader ends with 4500
>>
>>> 20  string  4500
>>>
>>
>> # leader starts with 5 digits, followed by codes specific to MARC format
>>
>>> 0   regex/1 (^[0-9]{5})[acdnp][^bhlnqsu-z]  MARC Bibliographic
>>>> 0   regex/1 (^[0-9]{5})[acdnosx][z] MARC Authority
>>>> 0   regex/1 (^[0-9]{5})[cdn][uvxy]  MARC Holdings
>>>> 0   regex/1 (^[0-9]{5})[acdn][w]MARC Classification
>>>> 0   regex/1 (^[0-9]{5})[cdn][q] MARC Community
>>>>
>>>
>
> --
> William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org
>



-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

[CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Owen Stephens

We are working on converting some MARC library records to RDF, and looking
at how we handle links to LCSH (id.loc.gov) - and I'm looking for feedback
on how we are proposing to do this...

I'm not 100% confident about the approach, and to some extent I'm trying to
work around the nature of how LCSH interacts with RDF at the moment I
guess... but here goes - I would very much appreciate
feedback/criticism/being told why what I'm proposing is wrong:

I guess what I want to do is preserve aspects of the faceted nature of LCSH
in a useful way, give useful links back to id.loc.gov where possible, and
give access to a wide range of facets on which the data set could be
queried. Because of this I'm proposing not just expressing the whole of the
650 field as a LCSH and checking for it's existence on id.loc.gov, but also
checking for various combinations of topical term and subdivisions from the
650 field. So for any 650 field I'm proposing we should check on
id.loc.govfor labels matching:

check(650$$a) --> topical term
check(650$$b) --> topical term
check(650$$v) --> Form subdivision
check(650$$x) --> General subdivision
check(650$$y) --> Chronological subdivision
check(650$$z) --> Geographic subdivision

Then using whichever elements exist (all as topical terms):
Check(650$$a--650$$b)
Check(650$$a--650$$v)
Check(650$$a--650$$x)
Check(650$$a--650$$y)
Check(650$$a--650$$z)
Check(650$$a--650$$b--650$$v)
Check(650$$a--650$$b--650$$x)
Check(650$$a--650$$b--650$$y)
Check(650$$a--650$$b--650$$z)
Check(650$$a--650$$b--650$$x--650$$v)
Check(650$$a--650$$b--650$$x--650$$y)
Check(650$$a--650$$b--650$$x--650$$z)
Check(650$$a--650$$b--650$$x--650$$z--650$$v)
Check(650$$a--650$$b--650$$x--650$$z--650$$y)
Check(650$$a--650$$b--650$$x--650$$z--650$$y--650$$v)


As an example given:

650 00 $$aPopular music$$xHistory$$y20th century

We would be checking id.loc.gov for

'Popular music' as a topical term (http://id.loc.gov/authorities/sh85088865)
'History' as a general subdivision (http://id.loc.gov/authorities/sh99005024
)
'20th century' as a chronological subdivision (
http://id.loc.gov/authorities/sh2002012476)
'Popular music--History and criticism' as a topical term (
http://id.loc.gov/authorities/sh2008109787)
'Popular music--20th century' as a topical term (not authorised)
'Popular music--History and criticism--20th century' as a topical term (not
authorised)


And expressing all matches in our RDF.

My understanding of LCSH isn't what it might be - but the ordering of terms
in the combined string checking is based on what I understand to be the
usual order - is this correct, and should we be checking for alternative
orderings?

Thanks

Owen


-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Owen Stephens

Thanks Tom - very helpful

Perhaps this suggests that rather using an order we should check
combinations while preserving the order of the original 650 field (I assume
this should in theory be correct always - or at least done to the best of
the cataloguers knowledge)?

So for:

650 _0 $$a Education $$z England $$x Finance.

check:

Education
England (subdiv)
Finance (subdiv)
Education--England
Education--Finance
Education--England--Finance

While for 650 _0 $$a Education $$x Economic aspects $$z England we check

Education
Economic aspects (subdiv)
England (subdiv)
Education--Economic aspects
Education--England
Education--Economic aspects--England

>
> - It is possible for other orders in special circumstances, e.g. with
> language dictionaries which can go something like:
>
> 650 _0 $$a English language $$v Dictionaries $$x Albanian.
>

This possiblity would also covered by preserving the order - check:

English Language
Dictionaries (subdiv)
Albanian (subdiv)
English Language--Dictionaries
English Language--Albanian
English Language--Dictionaries-Albanian

Creating possibly invalid headings isn't necessarily a problem - as we won't
get a match on id.loc.gov anyway. (Instinctively English Language--Albanian
doesn't feel right)


>
> - Some of these are repeatable, so you can have too $$vs following each
> other (e.g. Biography--Dictionaries); two $$zs (very common), as in
> Education--England--London; two $xs (e.g. Biography--History and criticism).
>
> OK - that's fine, we can use each individually and in combination for any
repeated headings I think


> - I'm not I've ever come across a lot of $$bs in 650s. Do you have a lot of
> them in the database?
>
> Hadn't checked until you asked! We have 1 in the dataset in question (c.30k
records) :)


> I'm not sure how possible it would be to come up with a definitive list of
> (reasonable) possible combinations.
>
> You are probably right - but I'm not too bothered about aiming at
'definitive' at this stage anyway - but I do want to get something
relatively functional/useful


> Tom
>
> Thomas Meehan
> Head of Current Cataloguing
> University College London Library Services
>
> Owen Stephens wrote:
>
>> We are working on converting some MARC library records to RDF, and looking
>> at how we handle links to LCSH (id.loc.gov <http://id.loc.gov>) - and I'm
>> looking for feedback on how we are proposing to do this...
>>
>>
>> I'm not 100% confident about the approach, and to some extent I'm trying
>> to work around the nature of how LCSH interacts with RDF at the moment I
>> guess... but here goes - I would very much appreciate
>> feedback/criticism/being told why what I'm proposing is wrong:
>>
>> I guess what I want to do is preserve aspects of the faceted nature of
>> LCSH in a useful way, give useful links back to id.loc.gov <
>> http://id.loc.gov> where possible, and give access to a wide range of
>> facets on which the data set could be queried. Because of this I'm proposing
>> not just expressing the whole of the 650 field as a LCSH and checking for
>> it's existence on id.loc.gov <http://id.loc.gov>, but also checking for
>> various combinations of topical term and subdivisions from the 650 field. So
>> for any 650 field I'm proposing we should check on id.loc.gov <
>> http://id.loc.gov> for labels matching:
>>
>>
>> check(650$$a) --> topical term
>> check(650$$b) --> topical term
>> check(650$$v) --> Form subdivision
>> check(650$$x) --> General subdivision
>> check(650$$y) --> Chronological subdivision
>> check(650$$z) --> Geographic subdivision
>>
>> Then using whichever elements exist (all as topical terms):
>> Check(650$$a--650$$b)
>> Check(650$$a--650$$v)
>> Check(650$$a--650$$x)
>> Check(650$$a--650$$y)
>> Check(650$$a--650$$z)
>> Check(650$$a--650$$b--650$$v)
>> Check(650$$a--650$$b--650$$x)
>> Check(650$$a--650$$b--650$$y)
>> Check(650$$a--650$$b--650$$z)
>> Check(650$$a--650$$b--650$$x--650$$v)
>> Check(650$$a--650$$b--650$$x--650$$y)
>> Check(650$$a--650$$b--650$$x--650$$z)
>> Check(650$$a--650$$b--650$$x--650$$z--650$$v)
>> Check(650$$a--650$$b--650$$x--650$$z--650$$y)
>> Check(650$$a--650$$b--650$$x--650$$z--650$$y--650$$v)
>>
>>
>> As an example given:
>>
>> 650 00 $$aPopular music$$xHistory$$y20th century
>>
>> We would be checking id.loc.gov <http://id.loc.gov> for
>>
>>
>> 'Popular music' as a topical term (
>> http://id.loc.gov/authorities/sh85088865)
>> 'History' as

Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Owen Stephens

Still digesting Andrew's response (thanks Andrew), but

On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso  wrote:

> *Currently under id.loc.gov you will not find name authority records, but
> you can find them at viaf.org*.
> *[YZ]*  viaf.org does not include geographic names. I just checked there
> England.
>

Is this not the relevant VIAF entry
http://viaf.org/viaf/14299580<http://viaf.org/viaf/142995804>


-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] LCSH and Linked Data

2011-04-07 Thread Owen Stephens

I'm out of my depth here :)

But... this is what I understood Andrew to be saying. In this instance
(?because 'England' is a Name Authority?) rather than create a separate LCSH
authority record for 'England' (as the 151), rather the LCSH subdivision is
recorded in the 781 of the existing Name Authority record.

Searching on http://authorities.loc.gov for England, I find an Authorised
heading, marked as a LCSH - but when I go to that record what I get is the
name authority record n 82068148 - the name authority record as represented
on VIAF by http://viaf.org/viaf/142995804/ (which links to
http://errol.oclc.org/laf/n%20%2082068148.html)

Just as this is getting interesting time differences mean I'm about to head
home :)

Owen

On Thu, Apr 7, 2011 at 4:34 PM, LeVan,Ralph  wrote:

> If you look at the fields those names come from, I think they mean
> England as a corporation, not England as a place.
>
> Ralph
>
> > -Original Message-
> > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> Of
> > Owen Stephens
> > Sent: Thursday, April 07, 2011 11:28 AM
> > To: CODE4LIB@LISTSERV.ND.EDU
> > Subject: Re: [CODE4LIB] LCSH and Linked Data
> >
> > Still digesting Andrew's response (thanks Andrew), but
> >
> > On Thu, Apr 7, 2011 at 4:17 PM, Ya'aqov Ziso 
> wrote:
> >
> > > *Currently under id.loc.gov you will not find name authority
> records, but
> > > you can find them at viaf.org*.
> > > *[YZ]*  viaf.org does not include geographic names. I just checked
> there
> > > England.
> > >
> >
> > Is this not the relevant VIAF entry
> > http://viaf.org/viaf/14299580<http://viaf.org/viaf/142995804>
> >
> >
> > --
> > Owen Stephens
> > Owen Stephens Consulting
> > Web: http://www.ostephens.com
> > Email: o...@ostephens.com
>



-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] LCSH and Linked Data

2011-04-08 Thread Owen Stephens

Thanks for all the information and discussion.

I don't think I'm familiar enough with Authority file formats to completely
comprehend - but I certainly understand the issues around the question of
'place' vs 'histo-geo-poltical entity'. Some of this makes me worry about
the immediate applicability of the LC Authority files in the Linked Data
space - someone said to me recently 'SKOS is just a way of avoiding dealing
with the real semantics' :)

Anyway - putting that to one side, the simplest approach for me at the
moment seems to only look at authorised LCSH as represented on id.loc.gov.
Picking up on Andy's first response:

On Thu, Apr 7, 2011 at 3:46 PM, Houghton,Andrew  wrote:

> After having done numerous matching and mapping projects, there are some
> issues that you will face with your strategy, assuming I understand it
> correctly. Trying to match a heading starting at the left most subfield and
> working forward will not necessarily produce correct results when matching
> against the LCSH authority file. Using your example:
>
>
>
> 650 _0 $a Education $z England $x Finance
>
>
>
> is a good example of why processing the heading starting at the left will
> not necessarily produce the correct results.  Assuming I understand your
> proposal you would first search for:
>
>
>
> 150 __ $a Education
>
>
>
> and find the heading with LCCN sh85040989. Next you would look for:
>
>
>
> 181 __ $z England
>
>
>
> and you would NOT find this heading in LCSH.
>

OK - ignoring the question of where the best place to look for this is - I
can live with not matching it for now. Later (perhaps when I understand it
better, or when these headings are added to id.loc.gov we can revisit this)

> The second issue using your example is that you want to find the “longest”
> matching heading. While the pieces parts are there, so is the enumerated
> authority heading:
>
>
>
> 150 __ $a Education $z England
>
>
>
> as LCCN sh2008102746. So your heading is actually composed of the
> enumerated headings:
>
>
>
> sh2008102746150 __ $a Education $z England
>
> sh2002007885180 __ $x Finance
>
>
>
> and not the separate headings:
>
>
>
> sh85040989 150 __ $a Education
>
> n82068148   150 __ $a England
>
> sh2002007885180 __ $x Finance
>
>
>
> Although one could argue that either analysis is correct depending upon
> what you are trying to accomplish.
>
>
>

What I'm interested in is representing the data as RDF/Linked Data in a way
that opens up the best opportunities for both understanding and querying the
data. Unfortunately at the moment there isn't a good way of representing
LCSH directly in RDF (the MADS work may help I guess but to be honest at the
moment I see that as overly complex - but that's another discussion).

What I can do is make statements that an item is 'about' a subject (probably
using dc:subject) and then point at an id.loc.gov URI. However, if I only
express individual headings:
Education
England (natch)
Finance

Then obviously I lose the context of the full heading - so I also want to
look for
Education--England--Finance (which I won't find on id.loc.gov as not
authorised)

At this point I could stop, but my feeling is that it is useful to also look
for other combinations of the terms:

Education--England (not authorised)
Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008)

My theory is that as long as I stick to combinations that start with a
topical term I'm not going to make startlingly inaccurate statements?

> The matching algorithm I have used in the past contains two routines. The
> first f(a) will accept a heading as a parameter, scrub the heading, e.g.,
> remove unnecessary subfield like $0, $3, $6, $8, etc. and do any other
> pre-processing necessary on the heading, then call the second function f(b).
> The f(b) function accepts a heading as a parameter and recursively calls
> itself until it builds up the list LCCNs that comprise the heading. It first
> looks for the given heading when it doesn’t find it, it removes the **last
> ** subfield and recursively calls itself, otherwise it appends the found
> LCCN to the returned list and exits. This strategy will find the longest
> match.
>

Unless I've misunderstood this, this strategy would not find
'Education--Finance'? Instead I need to remove each *subdivision* in turn
(no matter where it appears in the heading order) and try all possible
combinations checking each for a match on id.loc.gov. Again, I can do this
without worrying about possible invalid headings, as these wouldn't have
been authorised anyway...

I can check the number of variations around this but I guess that in my
limited set of records (only 30k) there will be a relatively small number of
possible patterns to check.

Does that make sense?

Re: [CODE4LIB] LCSH and Linked Data

2011-04-08 Thread Owen Stephens

Thanks Ross - I have been pushing some cataloguing folk to comment on some
of this as well (and have some feedback) - but I take the point that wider
consultation via autocat could be a good idea. (for some reason this makes
me slightly nervous!)s

In terms of whether Education--England--Finance is authorised or not - I
think I took from Andy's response that it wasn't, but also looking at it on
authorities.loc.gov it isn't marked as 'authorised'. Anyway - the relevant
thing for me at this stage is that I won't find a match via id.loc.gov - so
I can't get a URI for it anyway.

There are clearly quite a few issues with interacting with LCSH as Linked
Data at the moment - I'm not that keen on how this currently works, and my
reaction to the MADS/RDF ontology is similar to that of Bruce D'Arcus (see
http://metadata.posterous.com/lcs-madsrdf-ontology-and-the-future-of-the-se),
but on the otherhand I want to embrace the opportunity to start joining some
stuff up and seeing what happens :)

Owen

On Fri, Apr 8, 2011 at 3:10 PM, Ross Singer  wrote:

> On Fri, Apr 8, 2011 at 5:02 AM, Owen Stephens  wrote:
>
> > Then obviously I lose the context of the full heading - so I also want to
> > look for
> > Education--England--Finance (which I won't find on id.loc.gov as not
> > authorised)
> >
> > At this point I could stop, but my feeling is that it is useful to also
> look
> > for other combinations of the terms:
> >
> > Education--England (not authorised)
> > Education--Finance (authorised! http://id.loc.gov/authorities/sh85041008
> )
> >
> > My theory is that as long as I stick to combinations that start with a
> > topical term I'm not going to make startlingly inaccurate statements?
>
> I would definitely ask this question somewhere other than Code4lib
> (autocat, maybe?), since I think the answer is more complicated than
> this (although they could validate/invalidate your assumption about
> whether or not this approach would get you "close enough").
>
> My understanding is that Education--England--Finance *is* authorized,
> because Education--Finance is and England is a free-floating
> geographic subdivision.  Because it's also an authorized heading,
> "Education--England--Finance" is, in fact, an authority.  The problem
> is that free-floating subdivisions cause an almost infinite number of
> permutations, so there aren't LCCNs issued for them.
>
> This is where things get super-wonky.  It's also the reason I
> initially created lcsubjects.org, specifically to give these (and,
> ideally, locally controlled subject headings) a publishing
> platform/centralized repository, but it quickly grew to be more than
> "just a side project".  There were issues of how the data would be
> constructed (esp. since, at the time, I had no access to the NAF), how
> to reconcile changes, provenance, etc.  Add to the fact that 2 years
> ago, there wasn't much linked library data going on, it was really
> hard to justify the effort.
>
> But, yeah, it would be worth running your ideas by a few catalogers to
> see what they think.
>
> -Ross.
>



-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] [dpla-discussion] Rethinking the "library" part of DPLA

2011-04-10 Thread Owen Stephens

I guess that people may already be familiar with the Candide 2.0 project at 
NYPL http://candide.nypl.org/text/ - this sounds not dissimilar to the type of 
approach being suggested

This document is built using Wordpress with the Digress.it plugin 
(http://digress.it/)

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 10 Apr 2011, at 17:35, Nate Hill wrote:

> Eric, thanks for finding enough merit in my post on the DPLA listserv
> to repost it here.
> 
> Karen and Peter, I completely agree with your feelings-
> But my point in throwing this idea out there was that despite all of
> the copyright issues, we don't really do a great job making a simple,
> intuitive, branded interface for the works that *are* available - the
> public domain stuff.  Instead we seem to be content with knowing that
> this content is out there, and letting vendors add it to their
> difficult-to-use interfaces.
> 
> I guess my hope, seeing this reposted here is that someone might have
> a suggestion as to why I would not host public domain ebooks on my own
> library's site.  Are there technical hurdles to consider?
> 
> I feel like I see a tiny little piece of the ebook access problem that
> we *can* solve here, while some of the larger issues will indeed be
> debated in forums like the DPLA for quite a while.  By solving a small
> problem along the way, perhaps when the giant 1923-2011 problem is
> resolved we'll have a clearer path as to what type of access we might
> provide.
> 
> 
> On 4/10/11, Peter Murray  wrote:
>> I, too, have been struggling with this aspect of the discussion. (I'm on the
>> DPLA list as well.) There seems to be this blind spot within the leadership
>> of the group to ignore the copyright problem and any interaction with
>> publishers of popular materials. One of the great hopes that I have for this
>> group, with all of the publicity it is generating, is to serve as a voice
>> and a focal point to bring authors, publishers and librarians together to
>> talk about a new digital ownership and sharing model.
>> 
>> That doesn't seem to be happening.
>> 
>> 
>> Peter
>> 
>> On Apr 10, 2011, at 10:05, "Karen Coyle"  wrote:
>> 
>>> I appreciate the spirit of this, but despair at the idea that
>>> libraries organize their services around public domain works, thus
>>> becoming early 20th century institutions. The gap between 1923 and
>>> 2011 is huge, and it makes no sense to users that a library provide
>>> services based on publication date, much less that enhanced services
>>> stop at 1923.
>>> 
>>> kc
>>> 
>>> Quoting Eric Hellman :
>>> 
>>>> The DPLA listserv is probably too impractical for most of Code4Lib,
>>>> but Nate Hill (who's on this list as well) made this contribution
>>>> there, which I think deserves attention from library coders here.
>>>> 
>>>> On Apr 5, 2011, at 11:15 AM, Nate Hill wrote:
>>>> 
>>>>> It is awesome that the project Gutenberg stuff is out there, it is
>>>>> a great start.  But libraries aren't using it right.  There's been
>>>>> talk on this list about the changing role of the public library in
>>>>> people's lives, there's been talk about the library brand, and some
>>>>> talk about what 'local' might mean in this context.  I'd suggest
>>>>> that we should find ways to make reading library ebooks feel local
>>>>> and connected to an immediate community.  Brick and mortar library
>>>>> facilities are public spaces, and librarians are proud of that.  We
>>>>> have collections of materials in there, and we host programs and
>>>>> events to give those materials context within the community.
>>>>> There's something special about watching a child find a good book,
>>>>> and then show it to his  or her friend and talk about how awesome
>>>>> it is.  There's also something special about watching a senior
>>>>> citizens book group get together and discuss a new novel every
>>>>> month.  For some reason, libraries really struggle with treating
>>>>> their digital spaces the same way.
>>>>> 
>>>>> I'd love to see libraries creating online conversations around
>>>>> ebooks in much the same way.  Take a title from project Gutenberg:
>>>>> The Adventures of Huckleberry Finn.  Why not host that bo

Re: [CODE4LIB] RDF for opening times/hours?

2011-06-07 Thread Owen Stephens

I'd suggest having a look at the Goid Relations ontology 
http://wiki.goodrelations-vocabulary.org/Quickstart - it's aimed at businesses 
but the OpeningHours specification might do what you need 
http://www.heppnetz.de/ontologies/goodrelations/v1.html#OpeningHoursSpecification

While handling public holidays etc is not immediately obvious it is covered in 
this mail 
http://ebusiness-unibw.org/pipermail/goodrelations/2010-October/000261.html

Picking up on the previous comment Good Relations in RDFa is one of the formats 
Google use for Rich Snippets and it is also picked up by Yahoo

Owen

On 7 Jun 2011, at 23:05, Tom Keays  wrote:

> There was a time, about 5 years ago, when I assumed that microformats
> were the way to go and spent a bit of time looking at hCalendar for
> representing iCalendar-formatted event information.
> 
> http://microformats.org/wiki/hcalendar
> 
> Not long after that, there was a lot of talk about RDF and RDFa for
> this same purpose. Now I was confused as to whether to change my
> strategy or not, but RDF Calendar seemed to be a good idea. The latter
> also was nice because it could be used to syndicate event information
> via RSS.
> 
> http://pemberton-vandf.blogspot.com/2008/06/how-to-do-hcalendar-in-rdfa.html
> http://www.w3.org/TR/rdfcal/
> 
> These days it seems to be all about HTML5 microdata, especially
> because of Rich Snippets and Google's support for this approach.
> 
> http://html5doctor.com/microdata/#microdata-action
> 
> All three approaches allow you to embed iCalendar formatted event
> information on a web page. All three of them do it differently. I'm
> even more confused now than I was 5 years ago. This should not be this
> hard, yet there is still no definitive way to deploy this information
> and preserve the semantics of the event information. Part of this may
> be because the iCalendar format, although widely used, is itself
> insufficient.
> 
> Tom

[CODE4LIB] PDF->text extraction

2011-06-21 Thread Owen Stephens

The CORE project at The Open University in the UK is doing some work on finding 
similarity between papers in institutional repositories (see 
http://core-project.kmi.open.ac.uk/ for more info).  The first step in the 
process is extracting text from the (mainly) pdf documents harvested from 
repositories

We've tried iText but had issues with quality
We moved to PDFBox but are having performance issues

Any other suggestions/experience?

Thanks,

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

Re: [CODE4LIB] PDF->text extraction

2011-06-22 Thread Owen Stephens

Thanks to all for the info and suggestions - we'll have a look at them.

Via another route I've had http://snowtide.com/PDFTextStream recommended 
(commercial, but looks like they are generally open to offering academic 
licenses for free at least for a limited period) - anyone tried that?

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 22 Jun 2011, at 03:43, Bill Janssen wrote:

> Simon Spero  wrote:
> 
>> Another option is to use the  ABBYY FineReader
>> SDK<http://www.abbyy.com/ocr_sdk_linux/overview/>.
>> Annoyingly, the linux version is one release behind the windows SDK (which
>> has improved support for multi core processing of single document).  Since
>> Owen's problem  is embarrassingly parallel, multi-core tuning isn't as
>> useful as being able to run on a local cluster or regional grid.   ABBYY
>> software tends to be a little pricey, but the results are usually very good.
> 
> If you're going to OCR, Nuance OmniPage is also very good, and I believe
> costs about the same as FineReader.  We also use tOCR, from Transym,
> which is Windows-only, but very accurate and cheap.  I have yet to see
> decent results on complicated pages (technical papers) from either
> OCRopus or Tesseract with the default models that they come with; I
> believe they're both still aimed at book page OCR.
> 
> Bill

[CODE4LIB] Developer Competition using Library/Archive/Museum data

2011-07-04 Thread Owen Stephens

Celebrate Liberation – A worldwide competition for open software developers & 
open data
UK Discovery (http://discovery.ac.uk/) and the Developer Community Supporting 
Innovation (DevCSI) project based at UKOLN are running a global Developer 
Competition throughout July 2011 to build open source software applications / 
tools, using at least one of our 10 open data sources collected from libraries, 
museums and archives.
Enter simply by blogging about your application and emailing the blog post URI 
to joy.pal...@manchester.ac.uk by the deadline of 2359 (your local time) on 
Monday 1 August 2011.
Full details of the competition, the data sets and how to enter are at 
http://discovery.ac.uk/developers/competition/
There are 13 prizes including 
Best entry for each dataset – there are 10 datasets so there could be 10 
winners of £30 Amazon vouchers and an aggregation could win more than one!

Data Munging – Best example of Consolidating or Aggregating or De-duplicating 
or Entity matching or … one prize of £100 Amazon voucher.

Overall winners – An EEE Pad Transformer for the overall winner and a £200 
Amazon voucher for the Runner Up.

And you can win more than once :)
Specific competition tag on twitter is #discodev, but #devcsi and #ukdiscovery 
also good to follow/use
Excited to see what people come up with - hope some of you are able to enter
Owen
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

[CODE4LIB] Show reuse of library/archive/museum data and win prizes

2011-08-08 Thread Owen Stephens

 the ways in which you have used 
this data so we can understand more fully the benefits of sharing it and 
improve our services. Please contact metad...@bl.uk if you wish to share your 
experiences with us and those that are using this service. Give Credit Where 
Credit is Due: The British Library has a responsibility to maintain its 
bibliographic data on the nation’s behalf. Please credit all use of this data 
to the British Library and link back to www.bl.uk/bibliographic/datafree.html 
in order that this information can be shared and developed with today’s 
Internet users as well as future generations. Duplicate of package:bluk-bnb

Tyne and Wear Museums Collections (Imagine)
Part of the Europeana Linked Open Data, this is a collection of metadata 
describing (and linking to digital copies where appropriate) items in the Tyne 
and Wear Museums Collections.

Cambridge University Library dataset #1
This data marks the first major out put of the COMET project. COMET is a JISC 
funded collaboration between Cambridge University Library and CARET, University 
of Cambridge. It is funded under the JISC Infrastructure for Resource Discovery 
programme. It represents work over a 20+ year period which contains a number of 
changes in practices and cataloguing tools. No attempt has been made to screen 
for quaility of records other than the Voyager export process. This data also 
includes the 180,000 'Tower Project' records published under the JISC Open 
Bibliography Project. 

JISC MOSAIC Activity Data
The JISC MOSAIC (www.sero.co.uk/jisc-mosaic.html) project gathered together 
data covering user activity in a few UK Higher Education libraries. The data is 
available for download and via an API and contains information on books 
borrowed during specific time periods, and where available describes links 
between books, courses, and year of study.

OpenURL Router Data (EDINA)
EDINA is making the OpenURL Router Data available from April 2011. It is 
derived from the logs of the OpenURL Router, which directs user requests for 
academic papers to the appropriate institutional resolver. It enables 
institutions to register their resolver once only, at 
[http://openurl.ac.uk](http://openurl.ac.uk "OpenURL Router"), and service 
providers may then use openurl.ac.uk as the “base URL” for OpenURL links for UK 
HE and FE customers. This is the product of JISC-funded project activity, and 
provides a unique data set. The data captured varies from request to request 
since different users enter different information into requests. Further 
information on the details of the data set, sample files and the data itself is 
available at 
[http://openurl.ac.uk/doc/data/data.html](http://openurl.ac.uk/doc/data/data.html
 "OpenURL Router Data"). The team would like to thank all the institutions 
involved in this initiative for their participation. The data are made 
available under the Open Data Commons (ODC) Public Domain Dedication and 
Licence and the ODC Attribution Sharealike Community Norms.



Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

Re: [CODE4LIB] CIS students, service learning, and the library

2011-10-14 Thread Owen Stephens

I was going to point to that too, and also note that the DevXS event was the 
brainchild of two students at the University of Lincoln, who went onto work at 
the University - including developing 'Jerome' a library search interface using 
MongoDB and the Sphinx index/search s/w http://jerome.library.lincoln.ac.uk/

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 13 Oct 2011, at 23:04, Robert Robertson wrote:

> Hi Ellen,
> 
> The event hasn't been held yet but it might be worth taking a look at what 
> DevCSI are doing with their DevXS event http://devxs.org/ and seeing what 
> comes out of it after the fact.
> 
> The DevCSI initiative (http://devcsi.ukoln.ac.uk/blog/) has run quite a few 
> hackday events (inlcuding dev8D ) as part of an effort to build a stronger 
> community of developers in HE in the UK and some of their events and 
> challenges have been around library data. 
> 
> DevXS is their first major foray into trying the same idea with CS and other 
> students but it might offer some ideas for events that could raise interest 
> in longer term service learning projects or tackle specific tasks.
> 
> cheers,
> John
> 
> 
> R. John Robertson
> skype: rjohnrobertson
> Research Fellow/ Open Education Resources programme support officer (JISC 
> CETIS),
> Centre for Academic Practice and Learning Enhancement
> University of Strathclyde
> Tel:+44 (0) 141 548 3072
> http://blogs.cetis.ac.uk/johnr/
> The University of Strathclyde is a charitable body, registered in Scotland, 
> with registration number SC015263
> 
> From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ellen K. 
> Wilson [ewil...@jaguar1.usouthal.edu]
> Sent: Thursday, October 13, 2011 9:29 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] CIS students, service learning, and the library
> 
> I am wondering if anyone has experience working with students
> (particularly CIS students) in service learning projects involving the
> library. I am currently supervising four first-year students who are
> working on a brief (10 hour) project involving the usability and
> redesign of the homepage as part of a first year seminar course.
> Obviously we won't get the whole thing done, but it is providing us with
> some valuable student insight into what should be on the page, etc.
> 
> I anticipate the CIS department's first-year experience program will
> want to continue this collaboration, so I'm trying to brainstorm some
> projects that might be useful for future semesters particularly for
> freshmen who are just beginning their course of study in computer
> science, information technology, or information systems. This semester's
> project was thrown together in only a few days and I would like to not
> do that again! Ideas would be appreciated.
> 
> Best regards,
> 
> Ellen
> 
> --
> Ellen Knowlton Wilson
> Instructional Services Librarian
> Room 250, University Library
> University of South Alabama
> 5901 USA Drive North
> Mobile, AL 36688
> (251) 460-6045
> ewil...@jaguar1.usouthal.edu

[CODE4LIB] Mobile technologies in libraries - fact finding survey

2011-11-24 Thread Owen Stephens

The m-libraries support project (http://www.m-libraries.info/) is part of 
JISC’s Mobile Infrastructure for Libraries programme 
(http://infteam.jiscinvolve.org/wp/2011/10/11/mobile-infrastructure-for-libraries-new-projects/)
 running from November 2011 until September 2012.

The project aims to build a collection of useful resources and case studies 
based on current developments using mobile technologies in libraries, and to 
foster a community for those working in the m-library area or interested in 
learning more.

A brief introductory survey has been devised to help inform the project - as a 
way of starting to gather information, to discover what information is needed 
to help libraries decide on a way forward, and to begin to understand what an 
m-libraries community could offer to help.

The survey should only take 5-10 minutes and all questions are optional. 

This is an open survey - please pass the survey link on to anyone else you 
think might be interested via email or social media: http://svy.mk/mlibs1 

If you’re interested in mobile technologies in libraries and would like to 
receive updates about the project, please visit our project blog at 
http://m-libraries.info and subscribe to updates (links in the right hand side 
for RSS or email subscriptions).

Thanks and best wishes,

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

Re: [CODE4LIB] Models of MARC in RDF

2011-11-28 Thread Owen Stephens

It would be great to start collecting transforms together - just a quick brain 
dump of some I'm aware of

MARC21 transformations
Cambridge University Library - http://data.lib.cam.ac.uk - transformation made 
available (in code) from same site
Open University - http://data.open.ac.uk - specific transform for materials 
related to teaching, code available at 
http://code.google.com/p/luceroproject/source/browse/trunk%20luceroproject/OULinkedData/src/uk/ac/open/kmi/lucero/rdfextractor/RDFExtractor.java
 (MARC transform is in libraryRDFExtraction method)
COPAC - small set of records from the COPAC Union catalogue - data and 
transform not yet published
Podes Projekt - LinkedAuthors - documentation at 
http://bibpode.no/linkedauthors/doc/Pode-LinkedAuthors-Documentation.pdf - 2 
stage transformation firstly from MARC to FRBRized version of data, then from 
FRBRized data to RDF. These linked from documentation
Podes Project - LinkedNonFiction - documentation at 
http://bibpode.no/linkednonfiction/doc/Pode-LinkedNonFiction-Documentation.pdf 
- MARC data transformed using xslt 
https://github.com/pode/LinkedNonFiction/blob/master/marcslim2n3.xsl

British Library British National Bibliography - 
http://www.bl.uk/bibliographic/datafree.html - data model documented, but no 
code available
Libris.se - some notes in various presentations/blogposts (e.g. 
http://dc2008.de/wp-content/uploads/2008/09/malmsten.pdf) but can't find 
explicit transformation
Hungarian National library - 
http://thedatahub.org/dataset/hungarian-national-library-catalog and 
http://nektar.oszk.hu/wiki/Semantic_web#Implementation - some information on 
ontologies used but no code or explicit transformation (not 100% sure this is 
from MARC)
Talis - implemented in several live catalogues including 
http://catalogue.library.manchester.ac.uk/  - no documentation or code afaik 
although some notes in 

MAB transformation
HBZ - some of the transformation documented at 
https://wiki1.hbz-nrw.de/display/SEM/Converting+the+Open+Data+from+the+hbz+to+BIBO,
 don't think any code published?

Would be really helpful if more projects published their transformations (or 
someone told me where to look!)

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 26 Nov 2011, at 15:58, Karen Coyle wrote:

> A few of the code4lib talk proposals mention projects that have or will 
> transform MARC records into RDF. If any of you have documentation and/or 
> examples of this, I would be very interested to see them, even if they are 
> "under construction."
> 
> Thanks,
> kc
> 
> -- 
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet

Re: [CODE4LIB] Models of MARC in RDF

2011-12-02 Thread Owen Stephens

Hi Esme - thanks for this. Do you have any documentation on which predicates 
you've used and MODS->RDF transformation?

Owen

On 2 Dec 2011, at 16:07, Esme Cowles  wrote:

> Owen-
> 
> Another strategy for capturing MARC data in RDF is to convert it to MODS (we 
> do this using the LoC MARC to MODS stylesheet: 
> http://www.loc.gov/standards/marcxml/xslt/MARC21slim2MODS.xsl).  From there, 
> it's pretty easy to incorporate into RDF.  There are some issues to be aware 
> of, such as how to map the MODS XML names to predicates and how to handle 
> elements that can appear in multiple places in the hierarchy.
> 
> -Esme
> --
> Esme Cowles 
> 
> "Necessity is the plea for every infringement of human freedom. It is the
> argument of tyrants; it is the creed of slaves." -- William Pitt, 1783
> 
> On 11/28/2011, at 8:25 AM, Owen Stephens wrote:
> 
>> It would be great to start collecting transforms together - just a quick 
>> brain dump of some I'm aware of
>> 
>> MARC21 transformations
>> Cambridge University Library - http://data.lib.cam.ac.uk - transformation 
>> made available (in code) from same site
>> Open University - http://data.open.ac.uk - specific transform for materials 
>> related to teaching, code available at 
>> http://code.google.com/p/luceroproject/source/browse/trunk%20luceroproject/OULinkedData/src/uk/ac/open/kmi/lucero/rdfextractor/RDFExtractor.java
>>  (MARC transform is in libraryRDFExtraction method)
>> COPAC - small set of records from the COPAC Union catalogue - data and 
>> transform not yet published
>> Podes Projekt - LinkedAuthors - documentation at 
>> http://bibpode.no/linkedauthors/doc/Pode-LinkedAuthors-Documentation.pdf - 2 
>> stage transformation firstly from MARC to FRBRized version of data, then 
>> from FRBRized data to RDF. These linked from documentation
>> Podes Project - LinkedNonFiction - documentation at 
>> http://bibpode.no/linkednonfiction/doc/Pode-LinkedNonFiction-Documentation.pdf
>>  - MARC data transformed using xslt 
>> https://github.com/pode/LinkedNonFiction/blob/master/marcslim2n3.xsl
>> 
>> British Library British National Bibliography - 
>> http://www.bl.uk/bibliographic/datafree.html - data model documented, but no 
>> code available
>> Libris.se - some notes in various presentations/blogposts (e.g. 
>> http://dc2008.de/wp-content/uploads/2008/09/malmsten.pdf) but can't find 
>> explicit transformation
>> Hungarian National library - 
>> http://thedatahub.org/dataset/hungarian-national-library-catalog and 
>> http://nektar.oszk.hu/wiki/Semantic_web#Implementation - some information on 
>> ontologies used but no code or explicit transformation (not 100% sure this 
>> is from MARC)
>> Talis - implemented in several live catalogues including 
>> http://catalogue.library.manchester.ac.uk/  - no documentation or code afaik 
>> although some notes in 
>> 
>> MAB transformation
>> HBZ - some of the transformation documented at 
>> https://wiki1.hbz-nrw.de/display/SEM/Converting+the+Open+Data+from+the+hbz+to+BIBO,
>>  don't think any code published?
>> 
>> Would be really helpful if more projects published their transformations (or 
>> someone told me where to look!)
>> 
>> Owen
>> 
>> Owen Stephens
>> Owen Stephens Consulting
>> Web: http://www.ostephens.com
>> Email: o...@ostephens.com
>> Telephone: 0121 288 6936
>> 
>> On 26 Nov 2011, at 15:58, Karen Coyle wrote:
>> 
>>> A few of the code4lib talk proposals mention projects that have or will 
>>> transform MARC records into RDF. If any of you have documentation and/or 
>>> examples of this, I would be very interested to see them, even if they are 
>>> "under construction."
>>> 
>>> Thanks,
>>> kc
>>> 
>>> -- 
>>> Karen Coyle
>>> kco...@kcoyle.net http://kcoyle.net
>>> ph: 1-510-540-7596
>>> m: 1-510-435-8234
>>> skype: kcoylenet

Re: [CODE4LIB] Models of MARC in RDF

2011-12-02 Thread Owen Stephens

Oh - and perhaps just/more importantly - how do you create URIs for you data 
and how do you reconcile against other sources?

Owen

On 2 Dec 2011, at 16:07, Esme Cowles  wrote:

> Owen-
> 
> Another strategy for capturing MARC data in RDF is to convert it to MODS (we 
> do this using the LoC MARC to MODS stylesheet: 
> http://www.loc.gov/standards/marcxml/xslt/MARC21slim2MODS.xsl).  From there, 
> it's pretty easy to incorporate into RDF.  There are some issues to be aware 
> of, such as how to map the MODS XML names to predicates and how to handle 
> elements that can appear in multiple places in the hierarchy.
> 
> -Esme
> --
> Esme Cowles 
> 
> "Necessity is the plea for every infringement of human freedom. It is the
> argument of tyrants; it is the creed of slaves." -- William Pitt, 1783
> 
> On 11/28/2011, at 8:25 AM, Owen Stephens wrote:
> 
>> It would be great to start collecting transforms together - just a quick 
>> brain dump of some I'm aware of
>> 
>> MARC21 transformations
>> Cambridge University Library - http://data.lib.cam.ac.uk - transformation 
>> made available (in code) from same site
>> Open University - http://data.open.ac.uk - specific transform for materials 
>> related to teaching, code available at 
>> http://code.google.com/p/luceroproject/source/browse/trunk%20luceroproject/OULinkedData/src/uk/ac/open/kmi/lucero/rdfextractor/RDFExtractor.java
>>  (MARC transform is in libraryRDFExtraction method)
>> COPAC - small set of records from the COPAC Union catalogue - data and 
>> transform not yet published
>> Podes Projekt - LinkedAuthors - documentation at 
>> http://bibpode.no/linkedauthors/doc/Pode-LinkedAuthors-Documentation.pdf - 2 
>> stage transformation firstly from MARC to FRBRized version of data, then 
>> from FRBRized data to RDF. These linked from documentation
>> Podes Project - LinkedNonFiction - documentation at 
>> http://bibpode.no/linkednonfiction/doc/Pode-LinkedNonFiction-Documentation.pdf
>>  - MARC data transformed using xslt 
>> https://github.com/pode/LinkedNonFiction/blob/master/marcslim2n3.xsl
>> 
>> British Library British National Bibliography - 
>> http://www.bl.uk/bibliographic/datafree.html - data model documented, but no 
>> code available
>> Libris.se - some notes in various presentations/blogposts (e.g. 
>> http://dc2008.de/wp-content/uploads/2008/09/malmsten.pdf) but can't find 
>> explicit transformation
>> Hungarian National library - 
>> http://thedatahub.org/dataset/hungarian-national-library-catalog and 
>> http://nektar.oszk.hu/wiki/Semantic_web#Implementation - some information on 
>> ontologies used but no code or explicit transformation (not 100% sure this 
>> is from MARC)
>> Talis - implemented in several live catalogues including 
>> http://catalogue.library.manchester.ac.uk/  - no documentation or code afaik 
>> although some notes in 
>> 
>> MAB transformation
>> HBZ - some of the transformation documented at 
>> https://wiki1.hbz-nrw.de/display/SEM/Converting+the+Open+Data+from+the+hbz+to+BIBO,
>>  don't think any code published?
>> 
>> Would be really helpful if more projects published their transformations (or 
>> someone told me where to look!)
>> 
>> Owen
>> 
>> Owen Stephens
>> Owen Stephens Consulting
>> Web: http://www.ostephens.com
>> Email: o...@ostephens.com
>> Telephone: 0121 288 6936
>> 
>> On 26 Nov 2011, at 15:58, Karen Coyle wrote:
>> 
>>> A few of the code4lib talk proposals mention projects that have or will 
>>> transform MARC records into RDF. If any of you have documentation and/or 
>>> examples of this, I would be very interested to see them, even if they are 
>>> "under construction."
>>> 
>>> Thanks,
>>> kc
>>> 
>>> -- 
>>> Karen Coyle
>>> kco...@kcoyle.net http://kcoyle.net
>>> ph: 1-510-540-7596
>>> m: 1-510-435-8234
>>> skype: kcoylenet

Re: [CODE4LIB] Models of MARC in RDF

2011-12-06 Thread Owen Stephens

I'd suggest that rather than shove it in a triple it might be better to point 
at alternative representations, including MARC if desirable (keep meaning to 
blog some thoughts about progressively enhanced metadata...)

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 6 Dec 2011, at 15:44, Karen Coyle wrote:

> Quoting "Fleming, Declan" :
> 
>> Hi - I'll note that the mapping decisions were made by our metadata services 
>> (then Cataloging) group, not by the tech folks making it all work, though we 
>> were all involved in the discussions.  One idea that came up was to do a, 
>> perhaps, lossy translation, but also stuff one triple with a text dump of 
>> the whole MARC record just in case we needed to grab some other element out 
>> we might need.  We didn't do that, but I still like the idea.  Ok, it was my 
>> idea.  ;)
> 
> I like that idea! Now that "disk space" is no longer an issue, it makes good 
> sense to keep around the "original state" of any data that you transform, 
> just in case you change your mind. I hadn't thought about incorporating the 
> entire MARC record string in the transformation, but as I recall the average 
> size of a MARC record is somewhere around 1K, which really isn't all that 
> much by today's standards.
> 
> (As an old-timer, I remember running the entire Univ. of California union 
> catalog on 35 megabytes, something that would now be considered a smallish 
> email attachment.)
> 
> kc
> 
>> 
>> D
>> 
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Esme 
>> Cowles
>> Sent: Monday, December 05, 2011 11:22 AM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] Models of MARC in RDF
>> 
>> I looked into this a little more closely, and it turns out it's a little 
>> more complicated than I remembered.  We built support for transforming to 
>> MODS using the MODS21slim2MODS.xsl stylesheet, but don't use that.  Instead, 
>> we use custom Java code to do the mapping.
>> 
>> I don't have a lot of public examples, but there's at least one public 
>> object which you can view the MARC from our OPAC:
>> 
>> http://roger.ucsd.edu/search/.b4827884/.b4827884/1,1,1,B/detlmarc~1234567&FF=&1,0,
>> 
>> The public display in our digital collections site:
>> 
>> http://libraries.ucsd.edu/ark:/20775/bb0648473d
>> 
>> The RDF for the MODS looks like:
>> 
>>
>>local
>>FVLP 222-1
>>
>>
>>ARK
>>
>> http://libraries.ucsd.edu/ark:/20775/bb0648473d
>>
>>
>>Brown, Victor W
>>personal
>>
>>
>>Amateur Film Club of San Diego
>>corporate
>>
>>
>>[196-]
>>
>>
>>2005
>>Film and Video Library, University of California, 
>> San Diego, La Jolla, CA 92093-0175 
>> http://orpheus.ucsd.edu/fvl/FVLPAGE.HTM
>>
>>
>>reformatted digital
>>16mm; 1 film reel (25 min.) :; sd., col. ;
>>
>>
>>lcsh
>>Ranching
>>
>> 
>> etc.
>> 
>> 
>> There is definitely some loss in the conversion process -- I don't know 
>> enough about the MARC leader and control fields to know if they are captured 
>> in the MODS and/or RDF in some way.  But there are quite a few local and 
>> note fields that aren't present in the RDF.  Other fields (e.g. 300 and 505) 
>> are mapped to MODS, but not displayed in our access system (though they are 
>> indexed for searching).
>> 
>> I agree it's hard to quantify lossy-ness.  Counting fields or characters 
>> would be the most objective, but has obvious problems with control 
>> characters sometimes containing a lot of information, and then the relative 
>> importance of different fields to the overall description.  There are other 
>> issues too -- some fields in this record weren't migrated because they 
>> duplicated collection-wide values, which are formulated slightly differently 
>> from the MARC record.  Some fields weren't migrated because they concern the 
>> physical object, and therefore don't really apply to the digital object.  So 
>> that really seems like a morass

Re: [CODE4LIB] Models of MARC in RDF

2011-12-06 Thread Owen Stephens

I think the strength of adopting RDF is that it doesn't tie us to a single 
vocab/schema. That isn't to say it isn't desirable for us to establish common 
approaches, but that we need to think slightly differently about how this is 
done - more application profiles than 'one true schema'.

This is why RDA worries me - because it (seems to?) suggest that we define a 
schema that stands alone from everything else and that is used by the library 
community. I'd prefer to see the library community adopting the best of what 
already exists and then enhancing where the existing ontologies are lacking. If 
we are going to have a (web of) linked data, then re-use of ontologies and IDs 
is needed. For example in the work I did at the Open University in the UK we 
ended up only a single property from a specific library ontology (the draft 
ISBD http://metadataregistry.org/schemaprop/show/id/1957.html "has place of 
publication, production, distribution").

I think it is interesting that many of the MARC->RDF mappings so far have 
adopting many of the same ontologies (although no doubt partly because there is 
a 'follow the leader' element to this - or at least there was for me when 
looking at the transformation at the Open University)

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 5 Dec 2011, at 18:56, Jonathan Rochkind wrote:

> On 12/5/2011 1:40 PM, Karen Coyle wrote:
>> 
>> This brings up another point that I haven't fully grokked yet: the use of 
>> MARC kept library data "consistent" across the many thousands of libraries 
>> that had MARC-based systems. 
> 
> Well, only somewhat consistent, but, yeah.
> 
>> What happens if we move to RDF without a standard? Can we rely on linking to 
>> provide interoperability without that rigid consistency of data models?
> 
> Definitely not. I think this is a real issue.  There is no magic to "linking" 
> or RDF that provides interoperability for free; it's all about the 
> vocabularies/schemata -- whether in MARC or in anything else.   (Note 
> different national/regional  library communities used different schemata in 
> MARC, which made interoperability infeasible there. Some still do, although 
> gradually people have moved to Marc21 precisely for this reason, even when 
> Marc21 was less powerful than the MARC variant they started with).
> 
> That is to say, if we just used MARC's own implicit vocabularies, but output 
> them as RDF, sure, we'd still have consistency, although we wouldn't really 
> _gain_ much.On the other hand, if we switch to a new better vocabulary -- 
> we've got to actually switch to a new better vocabulary.  If it's just 
> "whatever anyone wants to use", we've made it VERY difficult to share data, 
> which is something pretty darn important to us.
> 
> Of course, the goal of the RDA process (or one of em) was to create a new 
> schema for us to consistently use. That's the library community effort to 
> maintain a common schema that is more powerful and flexible than MARC.  If 
> people are using other things instead, apparently that failed, or at least 
> has not yet succeeded.

Re: [CODE4LIB] Models of MARC in RDF

2011-12-07 Thread Owen Stephens

Fair point. Just instinct on my part that putting it in a triple is a bit ugly 
:)

It probably doesn't make any difference, although I don't think storing in a 
triple ensures that it sticks to the object (you could store the triple 
anywhere as well)

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 6 Dec 2011, at 22:43, Fleming, Declan wrote:

> Hi - point at it where?  We could point back to the library catalog that we 
> harvested in the MARC to MODS to RDF process, but what if that goes away?  
> Why not write ourselves a 1K insurance policy that sticks with the object for 
> its life?
> 
> D
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen 
> Stephens
> Sent: Tuesday, December 06, 2011 8:06 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Models of MARC in RDF
> 
> I'd suggest that rather than shove it in a triple it might be better to point 
> at alternative representations, including MARC if desirable (keep meaning to 
> blog some thoughts about progressively enhanced metadata...)
> 
> Owen
> 
> Owen Stephens
> Owen Stephens Consulting
> Web: http://www.ostephens.com
> Email: o...@ostephens.com
> Telephone: 0121 288 6936
> 
> On 6 Dec 2011, at 15:44, Karen Coyle wrote:
> 
>> Quoting "Fleming, Declan" :
>> 
>>> Hi - I'll note that the mapping decisions were made by our metadata 
>>> services (then Cataloging) group, not by the tech folks making it all 
>>> work, though we were all involved in the discussions.  One idea that 
>>> came up was to do a, perhaps, lossy translation, but also stuff one 
>>> triple with a text dump of the whole MARC record just in case we 
>>> needed to grab some other element out we might need.  We didn't do 
>>> that, but I still like the idea.  Ok, it was my idea.  ;)
>> 
>> I like that idea! Now that "disk space" is no longer an issue, it makes good 
>> sense to keep around the "original state" of any data that you transform, 
>> just in case you change your mind. I hadn't thought about incorporating the 
>> entire MARC record string in the transformation, but as I recall the average 
>> size of a MARC record is somewhere around 1K, which really isn't all that 
>> much by today's standards.
>> 
>> (As an old-timer, I remember running the entire Univ. of California 
>> union catalog on 35 megabytes, something that would now be considered 
>> a smallish email attachment.)
>> 
>> kc
>> 
>>> 
>>> D
>>> 
>>> -Original Message-
>>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf 
>>> Of Esme Cowles
>>> Sent: Monday, December 05, 2011 11:22 AM
>>> To: CODE4LIB@LISTSERV.ND.EDU
>>> Subject: Re: [CODE4LIB] Models of MARC in RDF
>>> 
>>> I looked into this a little more closely, and it turns out it's a little 
>>> more complicated than I remembered.  We built support for transforming to 
>>> MODS using the MODS21slim2MODS.xsl stylesheet, but don't use that.  
>>> Instead, we use custom Java code to do the mapping.
>>> 
>>> I don't have a lot of public examples, but there's at least one public 
>>> object which you can view the MARC from our OPAC:
>>> 
>>> http://roger.ucsd.edu/search/.b4827884/.b4827884/1,1,1,B/detlmarc~123
>>> 4567&FF=&1,0,
>>> 
>>> The public display in our digital collections site:
>>> 
>>> http://libraries.ucsd.edu/ark:/20775/bb0648473d
>>> 
>>> The RDF for the MODS looks like:
>>> 
>>>   
>>>   local
>>>   FVLP 222-1
>>>   
>>>   
>>>   ARK
>>>   
>>> http://libraries.ucsd.edu/ark:/20775/bb0648473d
>>>   
>>>   
>>>   Brown, Victor W
>>>   personal
>>>   
>>>   
>>>   Amateur Film Club of San Diego
>>>   corporate
>>>   
>>>   
>>>   [196-]
>>>   
>>>   
>>>   2005
>>>   Film and Video Library, University of California, 
>>> San Diego, La Jolla, CA 92093-0175 
>>> http://orpheus.ucsd.edu/fvl/FVLPAGE.HTM
>>>   
>>>   
>>>   reformatted digital
>>>   16mm; 1 film reel (25 min.) :; sd., col.

Re: [CODE4LIB] Models of MARC in RDF

2011-12-07 Thread Owen Stephens

When I did a project converting records from UKMARC -> MARC21 we kept the 
UKMARC records for a period (about 5 years I think) while we assured ourselves 
that we hadn't missed anything vital. We did occasionally refer back to the 
older record to check things, but having not found any major issues with the 
conversion after that period we felt confident disposing of the record. This is 
the type of usage I was imagining for a copy of the MARC record in this 
scenario.

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 7 Dec 2011, at 01:52, Montoya, Gabriela wrote:

> One critical thing to consider with MARC records (or any metadata, for that 
> matter) is that it they are not stagnant, so what is the value of storing 
> entire record strings into one triple if we know that metadata is volatile? 
> As an example, UCSD has over 200,000 art images that had their metadata 
> records ingested into our local DAMS over five years ago. Since then, many of 
> these records have been edited/massaged in our OPAC (and ARTstor), but these 
> updated records have not been refreshed in our DAMS. Now we find ourselves 
> needing to desperately have the "What is our database of record?" 
> conversation.
> 
> I'd much rather see resources invested in data synching than spending it in 
> saving text dumps that will most likely not be referred to again.
> 
> Dream Team for Building a MARC > RDF Model: Karen Coyle, Alistair Miles, 
> Diane Hillman, Ed Summers, Bradley Westbrook.
> 
> Gabriela
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
> Coyle
> Sent: Tuesday, December 06, 2011 7:44 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Models of MARC in RDF
> 
> Quoting "Fleming, Declan" :
> 
>> Hi - I'll note that the mapping decisions were made by our metadata 
>> services (then Cataloging) group, not by the tech folks making it all 
>> work, though we were all involved in the discussions.  One idea that 
>> came up was to do a, perhaps, lossy translation, but also stuff one 
>> triple with a text dump of the whole MARC record just in case we 
>> needed to grab some other element out we might need.  We didn't do 
>> that, but I still like the idea.  Ok, it was my idea.  ;)
> 
> I like that idea! Now that "disk space" is no longer an issue, it makes good 
> sense to keep around the "original state" of any data that you transform, 
> just in case you change your mind. I hadn't thought about incorporating the 
> entire MARC record string in the transformation, but as I recall the average 
> size of a MARC record is somewhere around 1K, which really isn't all that 
> much by today's standards.
> 
> (As an old-timer, I remember running the entire Univ. of California union 
> catalog on 35 megabytes, something that would now be considered a smallish 
> email attachment.)
> 
> kc
> 
>> 
>> D
>> 
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf 
>> Of Esme Cowles
>> Sent: Monday, December 05, 2011 11:22 AM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] Models of MARC in RDF
>> 
>> I looked into this a little more closely, and it turns out it's a 
>> little more complicated than I remembered.  We built support for 
>> transforming to MODS using the MODS21slim2MODS.xsl stylesheet, but 
>> don't use that.  Instead, we use custom Java code to do the mapping.
>> 
>> I don't have a lot of public examples, but there's at least one public 
>> object which you can view the MARC from our OPAC:
>> 
>> http://roger.ucsd.edu/search/.b4827884/.b4827884/1,1,1,B/detlmarc~1234
>> 567&FF=&1,0,
>> 
>> The public display in our digital collections site:
>> 
>> http://libraries.ucsd.edu/ark:/20775/bb0648473d
>> 
>> The RDF for the MODS looks like:
>> 
>>
>>local
>>FVLP 222-1
>>
>>
>>ARK
>> 
>> http://libraries.ucsd.edu/ark:/20775/bb0648473d
>>
>>
>>Brown, Victor W
>>personal
>>
>>
>>Amateur Film Club of San Diego
>>corporate
>>
>>
>>[196-]
>>
>>
>>2005
>>Film and Video Library, University of 
>> California, San Diego, La Jolla, CA 92093-0175 
>> http://

Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-07 Thread Owen Stephens

On 7 Dec 2011, at 00:38, Alexander Johannesen wrote:

> Hiya,
> 
> Karen Coyle  wrote:
>> I wonder how easy it will be to
>> manage a metadata scheme that has cherry-picked from existing ones, so
>> something like:
>> 
>> dc:title
>> bibo:chapter
>> foaf:depiction
> 
> Yes, you're right in pointing out this as a problem. And my answer is;
> it's complicated. My previous "rant" on this list was about data
> models*, and dangnabbit if this isn't related as well.
> 
> What your example is doing is pointing out a new model based on bits
> of other models. This works fine, for the most part, when the concepts
> are simple; simple to understand, simple to extend. Often you'll find
> that what used to be unclear has grown clear over time (as more and
> more have used FOAF, you'll find some things are more used and better
> understood, while other parts of it fade into 'we don't really use
> that anymore')
> 
> But when things get complicated, it *can* render your model unusable.
> Mixed data models can be good, but can also lead directly to meta data
> hell. For example ;
> 
>  dc:title
>  foaf:title
> 
> Ouch. Although not a biggie, I see this kind of discrepancy all the
> time, so the argument against mixed models is of course that the power
> of definition lies with you rather than some third-party that might
> change their mind (albeit rare) or have similar terms that differ
> (more often).
> 
> I personally would say that the library world should define RDA as you
> need it to be, and worry less about reuse at this stage unless you
> know for sure that the external models do bibliographic meta data
> well.
> 

I agree this is a risk, and I suspect there is a further risk around simply the 
feeling of 'ownership' by the community - perhaps it is easier to feel 
ownership over an entire ontoloy than an 'application profile' of somekind.
It maybe that mapping is the solution to this, but if this is really going to 
work I suspect it needs to be done from the very start - otherwise it is just 
another crosswalk, and we'll get varying views on how much one thing maps to 
another (but perhaps that's OK - I'm not looking for perfection)

That said, I believe we need absolutely to be aiming for a world in which we 
work with mixed ontologies - no matter what we do other, relevant, data sources 
will use FOAF, Bibo etc.. I'm convinced that this gives us the opportunity to 
stop treating what are very mixed materials in a single way, while still 
exploiting common properties. For example Musical materials are really not well 
catered for in MARC, and we know there are real issues with applying FRBR to 
them - and I see the implementation of RDF/Linked Data as an opportunity to 
tackle this issue by adopting alternative ontologies where it makes sense, 
while still assigning common properties (dc:title) where this makes sense.

> HOWEVER!
> 
> When we're done talking about ontologies and vocabularies, we need to
> talk about identifiers, and there I would swing the other way and let
> reuse govern, because it is when you reuse an identifier you start
> thinking about what that identifiers means to *both* parties. Or, put
> differently ;
> 
> It's remarkably easier to get this right if the identifier is a
> number, rather than some word. And for that reason I'd say reuse
> identifiers (subject proxies) as they are easier to get right and
> bring a lot of benefits, but not ontologies (model proxies) as they
> can be very difficult to get right and don't necessarily give you what
> you want.

Agreed :)

Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Owen Stephens

The other issue that the 'modelling' brings (IMO) is that the model influences 
use - or better the other way round, the intended use and/or audience should 
influence the model. This raises questions for me about the value of a 
'neutral' model - which is what I perceive libraries as aiming for - treating 
users as a homogenous mass with needs that will be met by a single approach. 
Obviously there are resource implications to developing multiple models for 
different uses/audiences, and once again I'd argue that an advantage of the 
linked data approach is that it allows for the effort to be distributed amongst 
the relevant communities.

To be provocative - has the time come for us to abandon the idea that 
'libraries' act as one where cataloguing is concerned, and our metadata serves 
the same purpose in all contexts? (I can't decide if I'm serious about this or 
not!)

Owen



Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 11 Dec 2011, at 23:47, Karen Coyle wrote:

> Quoting Richard Wallis :
> 
> 
>> You get the impression that the BL "chose a subset of their current
>> bibliographic data to expose as LD" - it was kind of the other way around.
>> Having modeled the 'things' in the British National Bibliography domain
>> (plus those in related domain vocabularis such as VIAF, LCSH, Geonames,
>> Bio, etc.), they then looked at the information held in their [Marc] bib
>> records to identify what could be extracted to populate it.
> 
> Richard, I've been thinking of something along these lines myself, especially 
> as I see the number of "translating X to RDF" projects go on. I begin to 
> wonder what there is in library data that is *unique*, and my conclusion is: 
> not much. Books, people, places, topics: they all exist independently of 
> libraries, and libraries cannot take the credit for creating any of them. So 
> we should be able to say quite a bit about the resources in libraries using 
> shared data points -- and by that I mean, data points that are also used by 
> others. So once you decide on a model (as BL did), then it is a matter of 
> looking *outward* for the data to re-use.
> 
> I maintain, however, as per my LITA Forum talk [1] that the subject headings 
> (without talking about quality thereof) and classification designations that 
> libraries provide are an added value, and we should do more to make them 
> useful for discovery.
> 
> 
>> 
>> I know it is only semantics (no pun intended), but we need to stop using
>> the word 'record' when talking about the future description of 'things' or
>> entities that are then linked together.   That word has so many built in
>> assumptions, especially in the library world.
> 
> I'll let you battle that one out with Simon :-), but I am often at a loss for 
> a better term to describe the unit of metadata that libraries may create in 
> the future to describe their resources. Suggestions highly welcome.
> 
> kc
> [1] http://kcoyle.net/presentations/lita2011.html
> 
> 
> 
> 
> 
> -- 
> Karen Coyle
> kco...@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet

Re: [CODE4LIB] Namespace management, was Models of MARC in RDF

2011-12-12 Thread Owen Stephens

On 11 Dec 2011, at 23:30, Richard Wallis wrote:

> 
> There is no document I am aware of, but I can point you at the blog post by
> Tim Hodson [
> http://consulting.talis.com/2011/07/british-library-data-model-overview/]
> who helped the BL get to grips with and start thinking Linked Data.
> Another by the BL's Neil Wilson [
> http://consulting.talis.com/2011/10/establishing-the-connection/] filling
> in the background around his recent presentations about their work.

Neil Wilson at the BL has indicated a few times that in principle the BL has no 
problem sharing the software they used to extract the relevant data from the 
MARC records, but that there are licensing issues around the s/w due to the use 
of a proprietary compiler (sorry, I don't have any more details so I can't 
explain any more than this). I'm not sure whether this extends to sharing the 
source that would tell us what exactly was happening, but I think this would be 
worth more discussion with Neil - I'll try to pursue it with him when I get a 
chance

Owen

Re: [CODE4LIB] creating call number browse

2008-09-29 Thread Owen Stephens

It just seems like if you've got Endeca doing the heavy lifting already, then 
building something separate just to allow you to enter a specific point in a 
sorted results list sounds like hard work?

Two possible approaches that occur to me (and of course not knowing Endeca they 
may be well off base I guess).

Can Endeca retrieve all records with a call number, and drop the user into a 
specific point in the sorted results set? I'm guessing not, otherwise you 
probably wouldn't be looking for alternative approaches. Is the problem 
dropping the user in at the right point in the sorted results set, or in the 
size of the results set generated?

An alternative approach possibly? If Endeca can retrieve results and display 
them in Call Number order, then could you not submit a search that retrieves a 
'shelf' of books at a time? That is, take a Call Number as an input, calculate 
a range around the call number to search and pass this to Endeca? This allows 
you to control the set size, but still there is a question of whether Endeca 
can drop the user into a specific point within a sorted results set. If not, 
then can it return records in a format that you can then manipulate (e.g. XML)? 
With a small, pre-sorted, results set, it should be relatively easy to build 
something that drops the user into the correct point based on their search?

Owen

Owen Stephens
Assistant Director: eStrategy and Information Resources
Central Library
Imperial College London
South Kensington Campus
London
SW7 2AZ

t: +44 (0)20 7594 8829
e: [EMAIL PROTECTED]
> -Original Message-
> From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
> Emily Lynema
> Sent: 21 September 2008 16:38
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] creating call number browse
> 
> Well, we're using LC and SUDOC here. What I really want is something
> that is both searchable and browsable, so that users can type in a call
> number and then browse backward and forward as much as they want in
> call
> number order.
> 
> We have Endeca here, so my patrons can browse into the LC scheme and
> then sort the results in call number order, but I don't have a way to
> browse forward and backward starting with a specific call number (like
> you would if you were browsing the shelves physically).
> 
> -emily
> 
> Keith Jenkins wrote:
> > Emily,
> >
> > Are you using LC or Dewey?
> >
> > A while back, I wanted to generate browsable lists of new books,
> > organized by topic.  I ended up using the LC call number to group the
> > titles into manageable groups.  Here's an example:
> > http://supportingcast.mannlib.cornell.edu/newbooks/?loc=mann
> >
> > Titles are sorted by call number, and also grouped by the initial
> > letters of the LC classification, such as "Q" or "QL".  For monthly
> > lists of new books, most groupings usually have less than 20 titles,
> > which makes for easy browsing of titles within someone's general
> > subject of interest.  The Table of Contents at the top of the page
> > only lists those classifications that are present in the set of
> titles
> > currently being viewed.  (In an earlier version, Q would only be
> split
> > into QA, QB, etc. if there were more than 20 items with Q call
> > numbers.)
> >
> > Things do tend to get a bit out of control in some of the
> > classifications for literature... no one wants to scan through a list
> > of 452 titles:
> > http://supportingcast.mannlib.cornell.edu/newbooks/?class=PL
> >
> > So for entire collections, a lot more work would be needed to create
> > finer subgroups, since each classification is uniquely complex.  For
> > example:
> >   PL1-8844 : Languages of Eastern Asia, Africa, Oceania
> >   PL1-481 : Ural-Altaic languages
> >   PL21-396 : Turkic languages
> >   PL400-431 : Mongolian languages
> >   PL450-481 : Tungus Manchu languages
> >
> > (An idea... maybe it would work to simply forget about pre-
> determined,
> > named call number ranges and look for "natural breaks" in the call
> > numbers, rather than trying to model the intricate details of each
> > individual classification schedule.)
> >
> > The site runs on a set of MARC records extracted from the catalog.
> > Users can also subscribe to RSS feeds for any combination of
> location,
> > language, or classification group.
> >
> > I did some early experimentation to include cover images, but never
> > seemed to get enough matches to make that worthwhile.
> >
> > Keith
> >
> > Keith Jenkins
> > GIS/Geospatial Applications Librari

Re: [CODE4LIB] Zotero, unapi, and formats?

2010-04-06 Thread Owen Stephens

At the moment Zotero development seem to be focussing on the use of RDFa
using the Bibo ontology for picking up bib details from within pages (see
discussion on the Bibo Google group)

Owen

On Tue, Apr 6, 2010 at 4:17 PM, Chad Fennell  wrote:

> > It's still a LOT better than COinS for Zotero, I assume though.
>
> Yes, if only because you get more complete metadata with things like
> RIS than COinS does via OpenURL.  I do like the theoretical benefit of
> a metadata format request API , but the promise of richer metadata
> (primarily for Zotero) was ultimately why I chose unAPI over COinS.
> And yeah, better documentation would be nice, thanks for looking into
> it.
>
> -Chad
>



-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

[CODE4LIB] OCLC UK Mashathon and 'Liver and Mash' Mashed Library event (Liverpool, May 13-14)

2010-04-14 Thread Owen Stephens

Just a quick plug for the OCLC Mashathon event taking place in Liverpool on
Thursday 13th May: http://mashlib2010.wordpress.com/

The Mashathon is being followed by a Mashed Library event ("Liver and Mash")
the following day at the same venue.  Delegates can attend either (or both)
days.

As Mashed Library has become a really popular event in the UK, the bookings
for that are going quite quickly (we currently have 13 spaces left) but
there's still plenty of room at the Mashathon.

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

[CODE4LIB] Bibliographic data on Freebase

2010-04-22 Thread Owen Stephens

Thought that some on the list might be interested in various discussions
happening on the Freebase email list at the moment:

Firstly some stuff on dealing with ISBNs
http://lists.freebase.com/pipermail/freebase-discuss/2010-April/thread.html,
and secondly (and more interesting I think) work on loading a University
Library catalogue (see the link for emails with the subject "UniversityX
Book 
Load<http://lists.freebase.com/pipermail/freebase-discuss/2010-April/001200.html>",
and also this page http://wiki.freebase.com/wiki/UniversityX_Load), which
includes work on mapping place of publication to existing Freebase location
information (
http://lists.freebase.com/pipermail/freebase-discuss/2010-April/001218.html)

Owen

-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] Bibliographic data on Freebase

2010-04-22 Thread Owen Stephens

I can't wait to get my hands on Gridworks - very excited (sad, I know), but
think potential for checking/correcting metadata and adding links to
relevant Freebase topics is very exciting

If anyone hasn't seen the demos of this yet, suggest having a look at the
screencasts at
http://blog.freebase.com/2010/03/26/preview-freebase-gridworks/

Owen

On Thu, Apr 22, 2010 at 2:45 PM, Sean Hannan  wrote:

> Also, David Huynh (of Gridworks and Freebase Parallax fame) dropped into
> IRC last week asking about MARC4J and its possible use with Gridworks.
>
> Things are afoot.
>
> -Sean
>
>
> On Apr 22, 2010, at 7:27 AM, Owen Stephens wrote:
>
> > Thought that some on the list might be interested in various discussions
> > happening on the Freebase email list at the moment:
> >
> > Firstly some stuff on dealing with ISBNs
> >
> http://lists.freebase.com/pipermail/freebase-discuss/2010-April/thread.html
> ,
> > and secondly (and more interesting I think) work on loading a University
> > Library catalogue (see the link for emails with the subject "UniversityX
> > Book Load<
> http://lists.freebase.com/pipermail/freebase-discuss/2010-April/001200.html
> >",
> > and also this page http://wiki.freebase.com/wiki/UniversityX_Load),
> which
> > includes work on mapping place of publication to existing Freebase
> location
> > information (
> >
> http://lists.freebase.com/pipermail/freebase-discuss/2010-April/001218.html
> )
> >
> > Owen
> >
> > --
> > Owen Stephens
> > Owen Stephens Consulting
> > Web: http://www.ostephens.com
> > Email: o...@ostephens.com
>



-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] Twitter annotations and library software

2010-04-28 Thread Owen Stephens

We've had problems with RIS on a recent project. Although there is a
specification (http://www.refman.com/support/risformat_intro.asp), it is (I
feel) lacking enough rigour to ever be implemented consistently. The most
common issue in the wild that I've seen is use of different tags for the
same information (which the specification does not nail down enough to know
when each should be used):

Use of TI or T1 for primary title
Use of AU or A1 for primary author
Use of UR, L1 or L2 to link to 'full text'

Perhaps more significantly the specification doesn't include any field
specifically for a DOI, but despite this EndNote (owned by ISI ResearchSoft,
who are also responsible for the RIS format specification) includes the DOI
in a DO field in its RIS output - not to specification.

Owen

On Wed, Apr 28, 2010 at 9:17 AM, Jakob Voss  wrote:

> Hi
>
> it's funny how quickly you vote against BibTeX, but at least it is a format
> that is frequently used in the wild to create citations. If you call BibTeX
> undocumented and garbage then how do you call MARC which is far more
> difficult to make use of?
>
> My assumption was that there is a specific use case for bibliographic data
> in twitter annotations:
>
> I. Identifiy publication => this can *only* be done seriously with
> identifiers like ISBN, DOI, OCLCNum, LCCN etc.
>
> II. Deliver a citation => use a citation-oriented format (BibTeX, CSL, RIS)
>
> I was not voting explicitly for BibTeX but at least there is a large
> community that can make use of it. I strongly favour CSL (
> http://citationstyles.org/) because:
>
> - there is a JavaScript CSL-Processor. JavaScript is kind of a punishment
> but it is the natural environment for the Web 2.0 Mashup crowd that is going
> to implement applications that use Twitter annotations
>
> - there are dozens of CSL citation styles so you can display a citation in
> any way you want
>
> As Ross pointed out RIS would be an option too, but I miss the easy open
> source tools that use RIS to create citations from RIS data.
>
> Any other relevant format that I know (Bibont, MODS, MARC etc.) does not
> aim at identification or citation at the first place but tries to model the
> full variety of bibliographic metadata. If your use case is
>
> III. Provide semantic properties and connections of a publication
>
> Then you should look at the Bibliographic Ontology. But III does *not*
> "just subsume" usecase II. - it is a different story that is not beeing told
> by normal people but only but metadata experts, semantic web gurus, library
> system developers etc. (I would count me to this groups). If you want such
> complex data then you should use other systems but Twitter for data exchange
> anyway.
>
> A list of CSL metadata fields can be found at
>
> http://citationstyles.org/downloads/specification.html#appendices
>
> and the JavaScript-Processor (which is also used in Zotero) provides more
> information for developers: http://groups.google.com/group/citeproc-js
>
> Cheers
> Jakob
>
> P.S: An example of a CSL record from the JavaScript client:
>
> {
> "title": "True Crime Radio and Listener Disenchantment with Network
> Broadcasting, 1935-1946",
>  "author": [ {
>"family": "Razlogova",
>"given": "Elena"
>  } ],
>  "container-title": "American Quarterly",
>  "volume": "58",
>  "page": "137-158",
>  "issued": { "date-parts": [ [2006, 3] ] },
>  "type": "article-journal"
>
> }
>
>
> --
> Jakob Voß , skype: nichtich
> Verbundzentrale des GBV (VZG) / Common Library Network
> Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
> +49 (0)551 39-10242, http://www.gbv.de
>



-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] Twitter annotations and library software

2010-04-28 Thread Owen Stephens

Unfortunately RefWorks only imports DO - not exports! We now recommend using
RefWorks XML when exporting (for our project) - which is fine, but not
publicly documented as far as I know :(

Zotero recommend using BibTex for importing from RefWorks I think

Owen

On Wed, Apr 28, 2010 at 2:05 PM, Walker, David  wrote:

> I was also just working on DOI with RIS.
>
> It looks like both Endnote and Refworks recognize 'DO' for DOIs.  But
> apparently Zotero does not.  If Zotero supported it, I'd say we'd have a de
> facto standard on our hands.
>
> In fact, I couldn't figure out how to pass a DOI to Zotero using RIS.  Or,
> at least, in my testing I never saw the DOI show-up in Zotero.  I don't
> really use Zotero, so I may have missed it.
>
> --Dave
>
> ==
> David Walker
> Library Web Services Manager
> California State University
> http://xerxes.calstate.edu
> ____
> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Owen
> Stephens [o...@ostephens.com]
> Sent: Wednesday, April 28, 2010 2:26 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] Twitter annotations and library software
>
> We've had problems with RIS on a recent project. Although there is a
> specification (http://www.refman.com/support/risformat_intro.asp), it is
> (I
> feel) lacking enough rigour to ever be implemented consistently. The most
> common issue in the wild that I've seen is use of different tags for the
> same information (which the specification does not nail down enough to know
> when each should be used):
>
> Use of TI or T1 for primary title
> Use of AU or A1 for primary author
> Use of UR, L1 or L2 to link to 'full text'
>
> Perhaps more significantly the specification doesn't include any field
> specifically for a DOI, but despite this EndNote (owned by ISI
> ResearchSoft,
> who are also responsible for the RIS format specification) includes the DOI
> in a DO field in its RIS output - not to specification.
>
> Owen
>
> On Wed, Apr 28, 2010 at 9:17 AM, Jakob Voss  wrote:
>
> > Hi
> >
> > it's funny how quickly you vote against BibTeX, but at least it is a
> format
> > that is frequently used in the wild to create citations. If you call
> BibTeX
> > undocumented and garbage then how do you call MARC which is far more
> > difficult to make use of?
> >
> > My assumption was that there is a specific use case for bibliographic
> data
> > in twitter annotations:
> >
> > I. Identifiy publication => this can *only* be done seriously with
> > identifiers like ISBN, DOI, OCLCNum, LCCN etc.
> >
> > II. Deliver a citation => use a citation-oriented format (BibTeX, CSL,
> RIS)
> >
> > I was not voting explicitly for BibTeX but at least there is a large
> > community that can make use of it. I strongly favour CSL (
> > http://citationstyles.org/) because:
> >
> > - there is a JavaScript CSL-Processor. JavaScript is kind of a punishment
> > but it is the natural environment for the Web 2.0 Mashup crowd that is
> going
> > to implement applications that use Twitter annotations
> >
> > - there are dozens of CSL citation styles so you can display a citation
> in
> > any way you want
> >
> > As Ross pointed out RIS would be an option too, but I miss the easy open
> > source tools that use RIS to create citations from RIS data.
> >
> > Any other relevant format that I know (Bibont, MODS, MARC etc.) does not
> > aim at identification or citation at the first place but tries to model
> the
> > full variety of bibliographic metadata. If your use case is
> >
> > III. Provide semantic properties and connections of a publication
> >
> > Then you should look at the Bibliographic Ontology. But III does *not*
> > "just subsume" usecase II. - it is a different story that is not beeing
> told
> > by normal people but only but metadata experts, semantic web gurus,
> library
> > system developers etc. (I would count me to this groups). If you want
> such
> > complex data then you should use other systems but Twitter for data
> exchange
> > anyway.
> >
> > A list of CSL metadata fields can be found at
> >
> > http://citationstyles.org/downloads/specification.html#appendices
> >
> > and the JavaScript-Processor (which is also used in Zotero) provides more
> > information for developers: http://groups.google.com/group/citeproc-js
> >
> > Cheers
> > Jakob
> >
> > P.S: An example of a CSL record from the JavaScript client:
>

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Owen Stephens

Dead ends from OpenURL enabled hyperlinks aren't a result of the standard
though, but rather an aspect of both the problem they are trying to solve,
and the conceptual way they try to do this.

I'd content these dead ends are an implementation issue - and despite this I
have to say that my experience on the ground is that feedback from library
users on the use of link resolvers is positive - much more so than many of
the other library systems I've been involved with.

What I do see as a problem is that this market seems to have essentially
stagnated, at least as far as I can see. I suspect the reasons for this are
complex, but it would be nice to see some more innovation in this area.

Owen

On Thu, Apr 29, 2010 at 6:14 PM, Ed Summers  wrote:

> On Thu, Apr 29, 2010 at 12:08 PM, Eric Hellman  wrote:
> > Since this thread has turned into a discussion on OpenURL...
> >
> > I have to say that during the OpenURL 1.0 standardization process, we
> definitely had moments of despair. Today, I'm willing to derive satisfaction
> from "it works" and overlook shortcomings. It might have been otherwise.
>
> Personally, I've followed enough OpenURL enabled hyperlink dead ends
> to contest "it works".
>
> //Ed
>

-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] Twitter annotations and library software

2010-04-30 Thread Owen Stephens

Alex,

Could you expand on how you think the problem that OpenURL tackles would
have been better approached with existing mechanisms? I'm not debating this
necessarily, but from my perspective when OpenURL was first introduced it
solved a real problem that I hadn't seen solved before.

Owen

On Thu, Apr 29, 2010 at 11:55 PM, Alexander Johannesen <
alexander.johanne...@gmail.com> wrote:

> Hi,
>
> On Thu, Apr 29, 2010 at 22:47, Walker, David  wrote:
> > I would suggest it's more because, once you step outside of the
> > primary use case for OpenURL, you end-up bumping into *other* standards.
>
> These issues were raised all the back when it was created, as well. I
> guess it's easy to be clever in hindsight. :) Here's what I wrote
> about it 5 years ago (http://shelter.nu/blog-159.html) ;
>
> So let's talk about 'Not invented here' first, because surely, we're
> all guilty of this one from time to time. For example, lately I dug
> into the ANSI/NISO Z39.88 -2004 standard, better known as OpenURL. I
> was looking at it critically, I have to admit, comparing it to what I
> already knew about Web Services, SOA, http,
> Google/Amazon/Flickr/Del.icio.us API's, and various Topic Maps and
> semantic web technologies (I was the technical editor of Explorers
> Guide to the Semantic Web)
>
> I think I can sum up my experiences with OpenURL as such; why? Why
> have the library world invented a new way of doing things that already
> can be done quite well already? Now, there is absolutely nothing wrong
> with the standard per se (except a pretty darn awful choice of
> name!!), so I'm not here criticising the technical merits and the work
> put into it. No, it's a simple 'why' that I have yet to get a decent
> answer to, even after talking to the OpenURL bigwigs about it. I mean,
> come on; convince me! I'm not unreasonable, no truly, really, I just
> want to be convinced that we need this over anything else.
>
>
> Regards,
>
> Alex
> --
>  Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
> --- http://shelter.nu/blog/ --
> -- http://www.google.com/profiles/alexander.johannesen ---
>



-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] Twitter annotations and library software

2010-04-30 Thread Owen Stephens

Tim,

I'd vote for adopting the same approach as COinS on the basis it already has
some level of adoption, and we know covers at least some of the stuff
libraries and academic users (as used by both libraries and consumer tools
such as Zotero) might want to do. We are talking Books (from what you've
said), so we don't have to worry about other formats. (although it does mean
we can do journal articles and some other stuff as well for no effort)

Mendeley and Zotero already speak COinS, it is pretty simple, and there are
already several code libraries to deal with it.

It isn't where I hope we end up in the longterm but if we talk about this
happening tomorrow, why not use something that is relatively simple, already
has a good set of implementations, and we know works for several cases of
embedding book metadata in a web environment

Owen

On Thu, Apr 29, 2010 at 7:01 PM, Jakob Voss  wrote:

> Dear Tim,
>
>
> you wrote:
>
>> So this is my recommended framework for proceeding. Tim, I'm afraid
>>> you'll actually have to do the hard work yourself.
>>>
>>
>> No, I don't. Because the work isn't fundamentally that hard. A
>> complex standard might be, but I never for a moment considered
>> anything like that. We have *512 bytes*, and it needs to be usable by
>> anyone. Library technology is usually fatally over-engineered, but
>> this is a case where that approach isn't even possible.
>>
>
> Jonathan did a very well summary - you just have to pick what you main
> focus of embedding bibliographic data is.
>
>
> A) I favour using the CSL-Record format which I summarized at
>
> http://wiki.code4lib.org/index.php/Citation_Style_Language
>
> because I had in mind that people want to have a nice looking citation of
> the publication that someone tweeted about. The drawback is that CSL is less
> adopted and will not always fit in 512 bytes
>
>
> B) If you main focus is to link Tweets about the same publication (and
> other stuff about this publication) than you must embed identifiers.
> LibraryThing is mainly based on two identifiers
>
> 1) ISBN to identify editions
> 2) LT work ids to identify works
>
> I wonder why LT work ids have not picked up more although you thankfully
> provide a full mapping to ISBN at
> http://www.librarything.com/feeds/thingISBN.xml.gz but nevermind. I
> thought that some LT records also contain other identifiers such as OCLC
> number, LOC number etc. but maybe I am wrong. The best way to specify
> identifiers is to use an URI (all relevant identifiers that I know have an
> URI form). For ISBN it is
>
> uri:isbn:{ISBN13}
>
> For LT Work-ID you can use the URL with your .com top level domain:
>
> http://www.librarything.com/work/{LTWORKID}<http://www.librarything.com/work/%7BLTWORKID%7D>
>
> That would fit for tweets about books with an ISBN and for tweets about a
> work which will make 99.9% of tweets from LT about single publications
> anyway.
>
>
> C) If your focus is to let people search for a publication in libraries
> than and to copy bibliographic data in reference management software then
> COinS is a way to go. COinS is based on OpenURL which I and others ranted
> about because it is a crapy library standard like MARC. But unlike other
> metadata formats COinS usually fits in less then 512 bytes. Furthermore you
> may have to deal with it for LibraryThing for libraries anyway.
>
>
> Although I strongly favour CSL as a practising library scientist and
> developer I must admit that for LibraryThing the best way is to embed
> identifiers (ISBN and LT Work-ID) and maybe COinS. As long as LibraryThing
> does not open up to more complex publications like preprints of
> proceeding-articles in series etc. but mainly deals with books and works
> this will make LibraryThing users happy.
>
>
>  Then, three years from now, we can all conference-tweet about a CIL talk,
>> about all the cool ways libraries are using Twitter, and how it's such a
>> shame that the annotations standard wasn't designed with libraries in mind.
>>
>
> How about a bet instead of voting. In three years will there be:
>
> a) No relevant Twitter annotations anyway
> b) Twitter annotations but not used much for bibliographic data
> c) A rich variety of incompatible bibliographic annotation standards
> d) Semantic Web will have solved every problem anyway
> ..
>
> Cheers
> Jakob
>
> --
> Jakob Voß , skype: nichtich
> Verbundzentrale des GBV (VZG) / Common Library Network
> Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
> +49 (0)551 39-10242, http://www.gbv.de
>



-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] Twitter annotations and library software

2010-04-30 Thread Owen Stephens

Thanks Alex,

This makes sense, and yes I see what your saying - and yes, if you end up
going back to custom coding because it's easier it does seem to defeat the
purpose.

However I'd argue that actually OpenURL 'succeeded' because it did manage to
get some level of acceptance (ignoring the question of whether it is v0.1 or
v1.0) - the cost of developing 'link resolvers' would have been much higher
if we'd been doing something different for each publisher/platform. In this
sense (I'd argue) sometimes crappy standards are better than none.

We've used OpenURL v1.0 in a recent project and because we were able to
simply pick up code already done for Zotero, and  we already had an OpenURL
resolver, the amount of new code we needed for this was minimal.

I think the point about Link Resolvers doing stuff that Apache and CGI
scripts were already doing is a good one - and I've argued before that what
we actually should do is separate some of this out (a bit like Johnathan did
with Umlaut) into an application that can answer questions about location
(what is generally called the KnowledgeBase in link resolvers) and the
applications that deal with analysing the context and the redirection

(To introduce another tangent in a tangential thread, interestingly (I
think!) I'm having a not dissimilar debate about Linked Data at the moment -
there are many who argue that it is too complex and that as long as you have
a nice RESTful interface you don't need to get bogged down in ontologies and
RDF etc. I'm still struggling with this one - my instinct is that it will
pay to standardise but so far I've not managed to convince even myself this
is more than wishful thinking at the moment)

Owen

On Fri, Apr 30, 2010 at 10:33 AM, Alexander Johannesen <
alexander.johanne...@gmail.com> wrote:

> On Fri, Apr 30, 2010 at 18:47, Owen Stephens  wrote:
> > Could you expand on how you think the problem that OpenURL tackles would
> > have been better approached with existing mechanisms?
>
> As we all know, it's pretty much a spec for a way to template incoming
> and outgoing URLs, defining some functionality along the way. As such,
> URLs with basic URI templates and rewriting have been around for a
> long time. Even longer than that is just the basics of HTTP which have
> status codes and functionality to do exactly the same. We've been
> doing link resolving since mid 90's, either as CGI scripts, or as
> Apache modules, so none of this were new. URI comes in, you look it up
> in a database, you cross-check with other REQUEST parameters (or
> sessions, if you must, as well as IP addresses) and pop out a 303
> (with some possible rewriting of the outgoing URL) (with the hack we
> needed at the time to also create dummy pages with META tags
> *shudder*).
>
> So the idea was to standardize on a way to do this, and it was a good
> idea as such. OpenURL *could* have had a great potential if it
> actually defined something tangible, something concrete like a model
> of interaction or basic rules for fishing and catching tokens and the
> like, and as someone else mentioned, the 0.1 version was quite a good
> start. But by the time when 1.0 came out, all the goodness had turned
> so generic and flexible in such a complex way that handling it turned
> you right off it. The standard also had a very difficult language, and
> more specifically didn't use enough of the normal geeky language used
> by sysadmins around. The more I tried to wrap my head around it, the
> more I felt like just going back to CGI scripts that looked stuff up
> in a database. It was easier to hack legacy code, which, well, defeats
> the purpose, no?
>
> Also, forgive me if I've forgotten important details; I've suppressed
> this part of my life. :)
>
>
> Kind regards,
>
> Alex
> --
>  Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
> --- http://shelter.nu/blog/ --
> -- http://www.google.com/profiles/alexander.johannesen ---
>

-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] it's cool to hate on OpenURL (was: Twitter annotations...)

2010-04-30 Thread Owen Stephens

Although part of the problem is that you might want to offer any service on
the basis of an OpenURL the major use case is supply of a document (either
online or via ILL) - so it strikes me you could look at DAIA
http://www.gbv.de/wikis/cls/DAIA_-_Document_Availability_Information_API ?
Jakob does this make sense?

Owen

On Fri, Apr 30, 2010 at 3:08 PM, Eric Hellman  wrote:

> OK, what does the EdSuRoSi spec for OpenURL responses say?
>
> Eric
>
> On Apr 30, 2010, at 9:40 AM, Ed Summers wrote:
>
> > On Fri, Apr 30, 2010 at 9:09 AM, Ross Singer 
> wrote:
> >> I actually think this lack of any specified response format is a large
> >> factor in the stagnation of OpenURL as a technology.  Since a resolver
> >> is under no obligation to do anything but present a web page it's
> >> difficult for local entrepreneurial types to build upon the
> >> infrastructure simply because there are no guarantees that it will
> >> work anywhere else (or even locally, depending on your vendor, I
> >> suppose), much less contribute back to the ecosystem.
> >
> > I agree. And that's an issue with the standard, not the implementations.
> >
> > //Ed
>
> Eric Hellman
> President, Gluejar, Inc.
> 41 Watchung Plaza, #132
> Montclair, NJ 07042
> USA
>
> e...@hellman.net
> http://go-to-hellman.blogspot.com/
>



-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

[CODE4LIB] Aquabrowser, SRU/SRW and other output formats

2010-05-26 Thread Owen Stephens

In this document
http://www.docstoc.com/docs/2286293/AquaBrowser-Library-FAQ-ADA-Compliant (and
I've know idea what the source of this is) it says:

"For data output, Aquabrowser supports RSS using either Dublin Core (DC) or
MarcXML metadata schemas. Also search results can be retrieved in SRU/SRW
with either Dublin Core (DC) or MarcXML schemas.

Communication to the client browser supports XML and JSON formats"

Does anyone know if this is true? If so, is this part of the basic product
or an add-on? My local public library has recently started to use
Aquabrowser, and I'm interested in whether I can get access to search
results etc. in a nice format for reuse etc.

Thanks

Owen

-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] locator

2010-06-30 Thread Owen Stephens

Hi Tom,

The mapping the library project started out (in my head) as simply using 
existing mapping tools to provide an interface to a map. The way the project 
went when we sat down and played for a day was slightly different, although 
still vaguely interesting :)

The thinking behind using Google Maps (which would apply to other 'mapping' 
interfaces - e.g. OpenLayers) was simply you get a set of tools that are 
designed to help navigation round a physical space. You can dispense with the 
geographic representation and simply use your own floorplan images. Whether 
this is the way to go probably depends on your requirements - but you would get 
functions like the ability to drop markers etc. 'for free' as it were, and also 
a well documented approach as the GMaps etc APIs come with good documentation.

However, more than once it has been suggested that this is a more complex 
approach than is required (I'm still not convinced by this - I think there are 
real strengths to this 'off the shelf' approach)

Some other bits and pieces that may be of interest:

My writeup of the day we worked on the Mapping the Library project 
http://www.meanboyfriend.com/overdue_ideas/2009/12/mashing-and-mapping/
A JISC funded project to look at producing 'item locator' service at the LSE 
http://findmylibrarybook.blogspot.com/

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 30 Jun 2010, at 13:24, Tom Vanmechelen wrote:

> We're considering  to expand our service with a item locator. "Mapping the 
> library" (http://mashedlibrary.com/wiki/index.php?title=Mapping_the_library) 
> describes how to build this with Google maps. But is this really the way to 
> go?  Does anyone has any experience with this? Does anyone have some best 
> practices for this kind of project knowing that we have about 20 buildings 
> spread all over the town? 
> 
> Tom
> 
> ---
> Tom Vanmechelen
> 
> K.U.Leuven / LIBIS
> W. De Croylaan 54 bus 5592
> BE-3001 Heverlee
> Tel  +32 16 32 27 93

Re: [CODE4LIB] DIY aggregate index

2010-07-01 Thread Owen Stephens

As others have suggested I think much of this is around the practicalities
of negotiating access, and the server power & expertise needed to run the
service - simply more efficient to do this in one place.

For me the change that we need to open this up is for publishers to start
pushing out a lot more of this data to all comers, rather than having to
have this conversation several times over with individual sites or
suppliers. How practical this is I'm not sure - especially as we are talking
about indexing full-text where available (I guess). I think the Google News
model (5-clicks free) is an interesting one - but not sure whether this, or
a similar approach, would work in a niche market which may not be so
interested in total traffic.

It seems (to me) obviously in the publishers interest for their content to
be as easily discoverable as possible that I am optimistic they will
gradually become more open to sharing more data that aids this - at least
metadata. I'd hope that this would eventually open up the market to a
broader set of suppliers, as well as institutions doing their own thing.

Owen

On Thu, Jul 1, 2010 at 2:37 AM, Eric Lease Morgan  wrote:

> On Jun 30, 2010, at 8:43 PM, Blake, Miriam E wrote:
>
> > We have locally loaded records from the ISI databases, INSPEC,
> > BIOSIS, and the Department of Energy (as well as from full-text
> > publishers, but that is another story and system entirely.) Aside
> > from the contracts, I can also attest to the major amount of
> > work it has been. We have 95M bibliographic records, stored in >
> > 75TB of disk, and counting. Its all running on SOLR, with a local
> > interface and the distributed aDORe repository on backend. ~ 2
> > FTE keep it running in production now.
>
>
> I definitely think what is outlined above -- local indexing -- is the way
> to go in the long run. Get the data. Index it. Integrate it into your other
> system. Know that you have it when you change or drop the license. No
> renting of data. And, "We don't need no stinkin' interfaces!" I believe a
> number of European institutions have been doing this for a number of years.
> I hear a few of us in the United States following suit.  ++
>
> --
> Eric Morgan
> University of Notre Dame.
>

-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com

Re: [CODE4LIB] "universal citation index"

2010-07-21 Thread Owen Stephens

Since noone has mentioned it yet, and it seems like it might be relevant, it 
may be worth looking at the CITO (Citation Ontology) (see 
http://imageweb.zoo.ox.ac.uk/pub/2008/publications/Shotton_ISMB_BioOntology_CiTO_final_postprint.pdf
 and 
http://imageweb.zoo.ox.ac.uk/pub/2009/citobase/cito-20091124-1.4/cito-content/owldoc/)

It is important to note that CITO describes the nature of a citation, as 
opposed to describing the thing cited. It also suggests a different angle on 
what a citation is - that is a citation is only a citation in context, 
otherwise it is simply a description of something.

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 20 Jul 2010, at 22:53, Young,Jeff (OR) wrote:

> I suspect this discussion happened on code4lib before the thread got
> cross-posting to LLD XG where I first saw it.
> 
> There are undoubtedly a ton of diverse use cases, but that doesn't mean
> APIs are the best solution. Here are some spitball possibilities for
> "not just manifestations" and "we need page numbers".
> 
> http://example.org/frbr:serial/2/citation-apa.{bcp-47}.txt
> http://example.org/frbr:manifestation/1/citation-apa.{bcp-47}.txt?xyz:st
> artPage=5&xyz:endPage=6  
> 
> I'm imagining an xyz ontology with startPage and endPage, but we can
> surely create it if something doesn't already exist.
> 
> Jeff
> 
>> -Original Message-
>> From: Tom Morris [mailto:tfmor...@gmail.com]
>> Sent: Tuesday, July 20, 2010 5:37 PM
>> To: Young,Jeff (OR)
>> Cc: Karen Coyle; Jodi Schneider; public-lld; Code for Libraries; Brian
>> Mingus
>> Subject: Re: "universal citation index"
>> 
>> On Tue, Jul 20, 2010 at 1:40 PM, Young,Jeff (OR) 
>> wrote:
>>> In terms of Linked Data, it should make sense to treat citations as
>>> text/plain variant representations of a FRBR Manifestation.
>> 
>> As Karen mentioned, many types of citation need more information than
>> just the manifestation.  You also need pages numbers, etc.
>> 
>> Tom
> 
> 
>

Re: [CODE4LIB] URL checking for the catalog

2012-02-24 Thread Owen Stephens

It's not quite the same thing, but I worked on a project a couple of years ago 
integrating references/citations into a learning environment (called Telstar 
http://www8.open.ac.uk/telstar/) , and looked at the question of how to deal 
with broken links from references.

We proposed a more reactive mechanism than running link checking software. This 
clearly has some disadvantages, but I think a major advantage is the targetting 
of staff time towards those links that are being used. The mechanism proposed 
was to add a level of redirection, with an intermediary script checking the 
availability of the destination URL before either:

a) passing the user on to the destination
b) finding the destination URL unresponsive (e.g. 404), automatically reporting 
the issue to library staff, and directing the user to a page explaining that 
the resource was not currently responding and that library staff had been 
informed

Particularly we proposed putting the destination URL into the rft_id of an 
OpenURL to achieve this, but this was only because it allowed us to piggyback 
on existing infrastructure using a standard approach - you could do the same 
with a simple script, with the destination URL as a parameter (if you are 
really interested, we created a new Source parser in SFX to do (a) and (b) ). 
Because we didn't necessarily have control over the URL in the reference, we 
also built a table that allowed us to map broken URLs being used in the 
learning environment to alternative URLs so we could offer a temporary redirect 
while we worked with the relevant staff to get corrections made to the 
reference link.

There's some more on this at 
http://www.open.ac.uk/blogs/telstar/remit-toc/remit-the-open-university-approach/remit-providing-links-to-resources-from-references/6-8-3-telstar-approach/
 although for some reason (my fault) this doesn't include a write up of the 
link checking process/code we created.

Of course, this approach is in no way incompatible with regular proactive link 
checking.

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 23 Feb 2012, at 17:02, Tod Olson wrote:

> There's been some recent discussion at our site about revi(s|v)ing URL 
> checking in our catalog, and I was wondering if other sites have any 
> strategies that they have found to be effective.
> 
> We used to run some home-grown link checking software. It fit nicely into a 
> shell pipeline, so it was easy to filter out sites that didn't want to be 
> link checked. But still the reports had too many spurious errors. And with 
> over a million links in the catalog, there are some issues of scale, both for 
> checking the links and consuming any report.
> 
> Anyhow, if you have some system you use as part of catalog link maintenance, 
> or if there's some link checking software that you've had good experiences 
> with, or if there's some related experience you'd like to share, I'd like to 
> hear about it.
> 
> Thanks,
> 
> -Tod
> 
> 
> Tod Olson 
> Systems Librarian 
> University of Chicago Library

1 2 >

1 - 100 of 131 matches

Mail list logo