Re: [CODE4LIB] online book price comparison websites?
I’ve liked bookfinder, but haven’t used it for a while. -Erik At Wed, 26 Feb 2014 15:19:00 -0500, Stephanie P Hess wrote: Try http://www.addall.com/. I used it all the time in my former incarnation as an Acquisitions Librarian. Cheers, Stephanie On Wed, Feb 26, 2014 at 3:14 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Anyone have any recommendations of online sites that compare online prices for purchasing books? I'm looking for recommendations of sites you've actually used and been happy with. They need to be searchable by ISBN. Bonus is if they have good clean graphic design. Extra bonus is if they manage to include shipping prices in their price comparisons. Thanks! Jonathan -- *Stephanie P. Hess* Electronic Resources Librarian Binghamton University Glenn G. Bartle Library 4400 East Vestal Parkway Vestal, NY 13902 607-777-2474 -- Sent from my free software system http://fsf.org/.
Re: [CODE4LIB] Python CMSs
At Thu, 13 Feb 2014 15:13:58 -0900, Coral Sheldon-Hess wrote: Hi, everyone! I've gotten clearance to totally rewrite my library's website in the framework/CMS of my choice (pretty much :)). As I have said on numerous occasions, If I can get paid to write Python, I want to do that! So, after some discussion with my department head/sysadmin, we're leaning toward Django. Hi Coral, My two cents: I think of Django as a CMS construction kit. (Keep in mind that it was originally developed for the newspaper business.) It’s probably more complicated to set up than drupal, or a CMS built with Django, but I would guess that the time you save with the CMS will be more than made up for later on, when you want to make it do something it wasn’t intended to. With its out of the box admin interface and an HTML editor, you have something that admins can use to generate content with hardly any work on your part. Basically - and this is my personal opinion, with only somewhat limited CMS experience - the CMS experiment has failed. The dream of something that is “customized”, not programmed, has never succeeded and never will. Django allow you to put together a “CMS” that fits your *exact* needs, with a little more up-front work, rather than something that requires less up-front work, but a lot more down the road. best, Erik -- Sent from my free software system http://fsf.org/.
Re: [CODE4LIB] Code4lib 2014 Diversity Scholarships: Call for Applications
Hi all, I can’t believe we are having this conversation again. I have nothing to add except to say that rather than feed the troll, you might do what I did, and turn your frustration at this thread arising *once again* into a donation to the Ada Initiative or similar organization. Sadly, it seems that one cannot contribute to the diversity scholarships, as I would be happy to do so. If anybody knows how, please let me know. best, Erik
Re: [CODE4LIB] Tool for feedback on document
At Wed, 16 Oct 2013 11:06:02 -0700, Walker, David wrote: Hi all, We're looking to put together a large policy document, and would like to be able to solicit feedback on the text from librarians and staff across two dozen institutions. We could just do that via email, of course. But I thought it might be better to have something web-based. A wiki is not the best solution here, as I don't want those providing feedback to be able to change the text itself, but rather just leave comments. My fall back plan is to just use Wordpress, breaking the document up into various pages or posts, which people can then comment on. But it seems to me there must be a better solutions here -- maybe one where people can leave comments in line with the text? Hi David, For the GPLv3 process, the Free Software Foundation developed a web application named stet for annotating and commenting on a text. Apparently the successor to that is considered co-ment [1] which has a gratis “lite” version [2]. That might solve your need. I’ve never tried it. best, Erik 1. http://www.co-ment.com/ 2. https://lite.co-ment.com/ Sent from my free software system http://fsf.org/.
Re: [CODE4LIB] [CODE4LIB] HEADS UP - Government shutdown will mean *.loc.gov is going offline October 1
At Mon, 30 Sep 2013 15:31:40 -0500, Becky Yoose wrote: FYI - this also means that there's a very good chance that the MARC standards site [1] and the Source Codes site [2] will be down as well. I don't know if there are any mirror sites out there for these pages. Thanks, Becky, about to be (forcefully) departed with her standards documentation Hi Becky, Well, there’s always archive.org: http://web.archive.org/web/20130816154112/http://www.loc.gov/marc/ best, Erik Sent from my free software system http://fsf.org/.
Re: [CODE4LIB] A question about voting points
At Mon, 1 Apr 2013 12:01:13 -0400, David J. Fiander wrote: So, I just voted for the Code4Lib 2014 location. There are two possible venues, and I was given three points to apportion however I wish. While having multiple votes, to spread around at will, makes a lot of sense, shouldn't the number of votes each elector is granted be limited to max(3, count(options)-1)? That is, when voting for a binary, I get one vote, when voting on a choice of three items, I get two votes, and for anything more than three choices, I get three votes? I mean, realistically, one could give one vote to Austin and two votes to Raleigh, but why bother? Hi David, You actually can vote 0-3 on any option, for as many total votes as you like. The optimal strategy, assuming that you actually prefer one option to another, is to vote 3 for the option you prefer and 0 for all others. To slightly change the subject, systems are a policy decision, not a technical problem. In the case of voting for presentations (more important to me that conference location), different voting systems will generate a different mix of presentations. Think of the difference between the American congress and a parliamentary system. The question is, does code4lib want conference presentations that are more “first past the post” [1] or more representative of the diversity of interests of the code4lib crowd (like a parliamentary system). The existing system reduces to a first past the post system, which means that the presentations which more people prefer win, rather than presentations that a smaller group of people might feel strongly about. This is a question that shouldn’t be decided by the technology; the policy should decide the technology. A google form might work, and certainly hand-counted emailed votes would, given the relative smallness of the c4l community. Those who are interested can read more here: http://en.wikipedia.org/wiki/Voting_system best, Erik 1. http://en.wikipedia.org/wiki/First-past-the-post_voting Sent from my free software system http://fsf.org/. pgpG8Iemj1bXJ.pgp Description: PGP signature
Re: [CODE4LIB] GitHub Myths (was thanks and poetry)
At Wed, 20 Feb 2013 11:20:33 -0500, Shaun Ellis wrote: (As a general rule, for every programmer who prefers tool A, and says that everybody should use it, there’s a programmer who disparages tool A, and advocates tool B. So take what we say with a grain of salt!) It doesn't matter what tools you use, as long as you and your team are able to participate easily, if you want to. But if you want to attract contributions from a given development community, then choices should be balanced between the preferences of that community and what best serve the project. It does matter what tools you use, which is why people are so passionate about them. But I agree completely that you need to balance the preferences of the community. From what I've been hearing, I think there is a lot of confusion about GitHub. Heck, I am constantly learning about new GitHub features, APIs, and best practices myself. But I find it to be an incredibly powerful platform for moving open source, distributed software development forward. I am not telling anyone to use GitHub if they don't want to, but I want to dispel a few myths I've heard recently: It’s not confusion; and these aren’t “myths”: they are disagreements. best, Erik Sent from my free software system http://fsf.org/. pgpB5ekrOeqHs.pgp Description: PGP signature
Re: [CODE4LIB] GitHub Myths (was thanks and poetry)
At Wed, 20 Feb 2013 11:50:45 -0800, Tom Johnson wrote: but it would be difficult to replace the social network around the projects. Especially difficult now that GitHub is where the community is. It's technically possible to build a social web that works on a decentralized basis, but it may no longer be culturally possible. Platforms are hard to get down from. Maybe. Most people today use internet email, not Compuserve email; they use the web, not AOL keywords; and they use jabber/xmpp, not ICQ. I don’t think it’s unreasonable to think that people will eventually leave twitter for a status.net implementation, or github for something else. best, Erik Sent from my free software system http://fsf.org/. pgpxpukkELTRd.pgp Description: PGP signature
Re: [CODE4LIB] thanks and poetry
At Sat, 16 Feb 2013 06:42:04 -0800, Karen Coyle wrote: gitHub may have excellent startup documentation, but that startup documentation describes git in programming terms mainly using *nx commands. If you have never had to use a version control system (e.g. if you do not write code, especially in a shared environment), clone push pull are very poorly described. The documentation is all in terms of *nx commands. Honestly, anything where this is in the documentation: On Windows systems, Git looks for the |.gitconfig| file in the |$HOME| directory (|%USERPROFILE%| in Windows’ environment), which is |C:\Documents and Settings\$USER| or |C:\Users\$USER| for most people, depending on version (|$USER| is |%USERNAME%| in Windows’ environment). is not going to work for anyone who doesn't work in Windows at the command line. No, git is NOT for non-coders. For what it’s worth, this programmer finds git’s interface pretty terrible. I prefer mercurial (hg), but I don’t know if it’s any better for people who aren’t familar with a command line. http://mercurial.selenic.com/guide/ (As a general rule, for every programmer who prefers tool A, and says that everybody should use it, there’s a programmer who disparages tool A, and advocates tool B. So take what we say with a grain of salt!) (And as a further aside, there’s plenty to dislike about github as well, from it’s person-centric view of projects (rather than team-centric) to its unfortunate centralizing of so much free/open source software on one platform.) best, Erik Sent from my free software system http://fsf.org/. pgpKhLEacXDgb.pgp Description: PGP signature
[CODE4LIB] code4lib 2013 location
Hi all, Apparently code4lib 2013 is going to be held at the UIC Forum http://www.uic.edu/depts/uicforum/ I assumed it would be at the conference hotel. This is just a note so that others do not make the same assumption, since nowhere in the information about the conference is the location made clear. Since the conference hotel is 1 mile from the venue, I assume transportation will be available. best, Erik Hetzner Sent from my free software system http://fsf.org/. pgpnr9TtfSgBA.pgp Description: PGP signature
Re: [CODE4LIB] Open source project questions
At Fri, 7 Dec 2012 14:58:11 -0500, Donna Campbell wrote: Dear Colleagues, I understand from a professional colleague, who referred me to this list, that there are some experienced open source programmers here. I am in the early stages of planning for a conference session/open source project in June 2013 for a different professional library organization. Here is the session title and description: […] Hi Donna, For understanding free/open source software development processes, you can’t beat Karl Fogel’s book, Producing open source software, available online: http://producingoss.com/ best, Erik Sent from my free software system http://fsf.org/.
Re: [CODE4LIB] anti-harassment policy for code4lib?
At Fri, 30 Nov 2012 11:34:41 +, MJ Ray wrote: Esmé Cowles escow...@ucsd.edu Also, I've seen a number of reports over the last few years of women who were harassed at predominately-male tech conferences. Taken together, they paint a picture of men (particularly drunken men) creating an atmosphere that makes a lot of people feel excluded and worry about being harassed or worse. So I think a positive statement of values, and the general raising of consciousness of these issues, is a good thing. I'm a member of software.coop, which helps write library software, including Koha - we co-hosted KohaCon12 this summer. Like all co-ops, our core values include equality. I would like to see an anti-harassment policy for code4lib. However, I'm saddened that I seem to be the first to object to the hand-waving (number of reports) and prejudice in the above paragraph. The above problems seem more likely to arise from being drunk or being idiots than from being men. […] Hi MJ, Starting from this incorrect position will lead to the wrong harassment guidelines being drawn up. Obviously the goal is equal respect, but you don’t get there by pretending that the root problem is drunkenness, or that men and women treat one another with disrespect in equal amounts. It’s not hand-waving to say that sexual harassment happens, and that (with negligible exceptions) it is is men who are the perpetrators. To pretend otherwise will not produce an effective anti-harassment policy. best, Erik Sent from my free software system http://fsf.org/.
Re: [CODE4LIB] Any libraries have their sites hosted on Amazon EC2?
At Wed, 22 Feb 2012 23:34:14 +0100, Thomas Krichel wrote: Roy Tennant writes I'd also be interested in getting some real world cost information. I installed an app on EC2 that went mostly unused for a couple months but meanwhile racked up over $300 in charges. Color me surprised. I am not impressed by Amazon either. I have an instance given to me by a sponsor, and there I have been taken aback by the old Debian kernel version this puts me in. I rent three root servers with Hetzner.de. That's for large-scale work. To run a blog, a 3TB disk 16 Gig ram box from Hetzner is overkill. With Hetzner you have the exchange rate risk but the cost structure is much simpler. Another satisfied customer. best, Erik Hetzner PS: But seriously, no relation. Sent from my free software system http://fsf.org/. pgpZvpL5tGJVN.pgp Description: PGP signature
Re: [CODE4LIB] Linux Laptop
At Wed, 14 Dec 2011 09:54:09 -0800, Chris Fitzpatrick wrote: Thanks everyone for all the recommendations. I know this would be this list to ask. Sounds like Ubuntu is the overwhelming favorite. In the past when I've used a linux in a non-server computer, there are always some annoying problems... things like the laptop not waking from sleep mode, power consumption problems, or the microphone not working. So, I wondering about specific laptop brands/models and linux distributions/versions that people have found to work really well. A Dell or ThinkPad with Ubuntu seems to be the popular choice? But, yeah, I know i started it, but I'm going to avoid going deeper into my opinions on Apple vs. Windows vs. Linux and the implications vis-à-vis productivity, copyright, social justice, and the plight of the polar bear. If only out of concern that introducing this discussion might cause the poor mail server at ND to meltdown….. For what its worth, I run Ubuntu happily on my old (2007) macbook. The only real tricky part is the lack of 2 pointer buttons. So you don’t need to get rid of the mac to switch off OS X. That said, I would not buy a mac again, if only because Apple has gone into full-bore evil mode. Finally, in my biased experience, a system running Ubuntu is now more usable than a system running OS X. This is my experience, and I am not going to argue about it. :) I imagine it works even better if you buy a system that is certified or pre-installed. And if you are interested in a netbook, although Ubuntu has discontinued the Ubuntu Netbook Edition, I think the Unity interface is pretty slick on a small netbook screen. best, Erik Sent from my free software system http://fsf.org/. pgpPAb27CWp55.pgp Description: PGP signature
Re: [CODE4LIB] Patents and open source projects
At Mon, 5 Dec 2011 08:17:26 -0500, Emily Lynema wrote: A colleague approached me this morning with an interesting question that I realized I didn't know how to answer. How are open source projects in the library community dancing around technologies that may have been patented by vendors? We were particularly wondering about this in light of open source ILS projects, like Kuali OLE, Koha, and Evergreen. I know OLE is still in the early stages, but did the folks who created Koha and Evergreen ever run into any problems in this area? Have library vendors historically pursued patents for their systems and solutions? I don’t think libraries have a particularly unique perspective on this: most free/open source software projects have the same issues with patents. The Software Freedom Law Center has some basic information about these issues. As I recall, the “Legal basics for developers” edition of their podcasts is useful [1], but other editions may be helpful as well. Basically, the standard advice for patents is what Mike Taylor gave: ignore them. Pay attention to copyright and trademark issues (as the Koha problem shows), but patents really don’t need to be on your radar. best, Erik 1. http://www.softwarefreedom.org/podcast/2011/aug/16/Episode-0x16-Legal-Basics-for-Developers/ Sent from my free software system http://fsf.org/. pgpzvpOQEi9B4.pgp Description: PGP signature
Re: [CODE4LIB] Web archiving and WARC
At Wed, 23 Nov 2011 18:30:02 -0500, Edward M. Corrado wrote: Hello All, I need to harvest a few Web sites in order to preserve them. I'd really like to preserve them using the WARC file format [1] since it is a standard for digital preservation. I looked at I looked at Web Curator Tool (WCT) and Heritrix and they seem to be good at what they do but are built to work on a much larger scale then what I'd like to do -- and that comes with a cost of increased complexity. Tools like wget are simple to use and can easily be scripted to accomplish my limited task, except the standard wget and similar tools I am familiar with do not support WARC. Also, I haven't been able to find a tool that can convert zipped files created with wget to WARC. I did find a version of wget with warc support built in [1] from the Archive Team so that may be my solution, but compile software with dirty written into the name of the zip file is maybe not the best longterm solution. Does anyone know of any other simples tool to create a WARC file (either from harvesting or converting a wget or similar mirror/archive)? Hi Edward, The WCT uses Heritrix behind the scenes. Basically Heritrix or wget+warc are your only two solutions, unless you convert to WARC from something else. And I have never seen another crawler that gathers the information that needs to do into the WARC file. Heritrix isn’t that bad to get up running. The more tricky issue is what to do with the WARC files once you have them. best, Erik Sent from my free software system http://fsf.org/. pgpTLQvRfPRwF.pgp Description: PGP signature
Re: [CODE4LIB] Plea for help from Horowhenua Library Trust to Koha Community
At Tue, 22 Nov 2011 13:51:11 +1300, Joann Ransom wrote: Horowhenua Library Trust is the birth place of Koha and the longest serving member of the Koha community. Back in 1999 when we were working on Koha, the idea that 12 years later we would be having to write an email like this never crossed our minds. It is with tremendous sadness that we must write this plea for help to you, the other members of the Koha community. […] Hi Joann, The Software Freedom Law Center (http://softwarefreedom.org) might be able to help as well: The Software Freedom Law Center provides pro-bono legal services to developers of Free, Libre, and Open Source Software. They list trademark defense as one of their services. best, Erik Sent from my free software system http://fsf.org/. pgpWIG8zg2J7D.pgp Description: PGP signature
Re: [CODE4LIB] internet explorer and pdf files
At Mon, 29 Aug 2011 15:30:56 -0400, Eric Lease Morgan wrote: I need some technical support when it comes to Internet Explorer (IE) and PDF files. Here at Notre Dame we have deposited a number of PDF files in a Fedora repository. Some of these PDF files are available at the following URLs: * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832898/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:999332/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832657/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1001919/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832818/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:834207/PDF1 Retrieving the URLs with any browser other than IE works just fine. Unfortunately IE's behavior is weird. The first time someone tries to load one of these URL nothing happens. When someone tries to load another one, it loads just fine. When they re-try the first one, it loads. We are banging our heads against the wall here at Catholic Pamphlet Central. Networking issue? Port issue? IE PDF plug-in? Invalid HTTP headers? On-campus versus off-campus issue? Could some of y'all try to load some of the URLs with IE and tell me your experience? Other suggestions would be greatly appreciated as well. Hi Eric, As I recall IE fetches PDFs oddly sometimes. It will do a GET, then interrupt it, the GET the favicon.ico, then resume the original GET using a Range header to request the rest of the PDF. (This info is copied from an email from May 2008, so it may be out of date). Your server might not like that kind of abuse. Wireshark can be your friend here. Hope that helps! best, Erik Sent from my free software system http://fsf.org/. pgpXz2Ut56zS1.pgp Description: PGP signature
Re: [CODE4LIB] to link or not to link: PURLs
At Wed, 26 Jan 2011 13:57:42 -0600, Pottinger, Hardy J. wrote: Hi, this topic has come up for discussion with some of my colleagues, and I was hoping to get a few other perspectives. For a public interface to a repository and/or digital library, would you make the handle/PURL an active hyperlink, or just provide the URL in text form? And why? My feeling is, making the URL an active hyperlink implies confidence in the PURL/Handle, and provides the user with functionality they expect of a hyperlink (right or option-click to copy, or bookmark). A permanent URL should be displayed in the address bar of the user’s browser. Then, when users do what they are going to do anyway (select the link in the address bar copy it), it will work. best, Erik Hetzner Sent from my free software system http://fsf.org/. pgp80PT94Qhgm.pgp Description: PGP signature
Re: [CODE4LIB] to link or not to link: PURLs
At Wed, 26 Jan 2011 17:01:05 -0500, Jonathan Rochkind wrote: It's sometimes not feasible/possible though. But it is unfortunate, and I agree you should always just do that where possible. I wonder if Google's use of the link rel=canonical element has been catching on with any other tools? Will any browses, delicious extensions, etc., bookmark that, or offer the option to bookmark that, or anything, instead of the one in the address bar? The W3C WWW Technical Architecture Group has some interest in making 302 found redirects work as they were supposed to in browsers [1], but there is not a lot of movement there, as far as I know. In the meantime I believe that we should strive in all cases to ensure that the URL in the address bar is the permanent URL. best, Erik 1. http://www.w3.org/QA/2010/04/why_does_the_address_bar_show.html Sent from my free software system http://fsf.org/. pgpTc0ywBEOv1.pgp Description: PGP signature
Re: [CODE4LIB] Inlining HTTP Headers in URLs
At Wed, 2 Jun 2010 15:23:05 -0400, Jonathan Rochkind wrote: Erik Hetzner wrote: Accept-Encoding is a little strange. It is used for gzip or deflate compression, largely. I cannot imagine needing a link to a version that is gzipped. It is also hard to imagine why a link would want to specify the charset to be used, possibly overriding a client’s preference. If my browser says it can only supports UTF-8 or latin-1, it is probably telling the truth. Perhaps when the client/user-agent is not actually a web browser that is simply going to display the document to the user, but is some kind of other software. Imagine perhaps archiving software that, by policy, only will take UTF-8 encoded documents, and you need to supply a URL which is guaranteed to deliver such a thing. Sure, the hypothetical archiving software could/should(?) just send an actual HTTP header to make sure it gets a UTF-8 charset document. But maybe sometimes it makes sense to provide an identifier that actually identifies/points to the UTF-8 charset version -- and that in the actual in-practice real world is more guaranteed to return that UTF-8 charset version from an HTTP request, without relying on content negotation which is often mis-implemented. We could probably come up with a similar reasonable-if-edge-case for encoding. So I'm not thinking so much of over-riding the conneg -- I'm thinking of your initial useful framework, one URI identifies a more abstract 'document', the other identifies a specific representation. And sometimes it's probably useful to identify a specific representation in a specific charset, or, more of a stretch, encoding. No? I’m certainly not thinking it should never be done. Personally I would leave it out of SRU without a serious use case, but that is obviously not my decision. Still, in my capacity as nobody whatsoever, I would advise against it. ;) I notice you didn't mention 'language', I assume we agree that one is even less of a stretch, and has more clear use cases for including in a URL, like content-type. Definitely. best, Erik Sent from my free software system http://fsf.org/. pgpQQ4F8ZxBbI.pgp Description: PGP signature
Re: [CODE4LIB] Inlining HTTP Headers in URLs
At Tue, 1 Jun 2010 14:21:56 -0400, LeVan,Ralph wrote: I've been sensing a flaw in HTTP for some time now. It seems like you ought to be able to do everything through a URL that you can using a complete interface to HTTP. Specifically, I'd love to be able to specify values for HTTP headers in a URL. To plug that gap locally, I'm looking for a java servlet filter that will look for query parameters in a URL, recognize that some of them are HTTP Headers, strip the query parms and set those Headers in the request that my java servlet eventually gets. Does such a filter exist already? I've looked and not been able to find anything. It seems like the work of minutes to produce such a filter. I'll be happy to put it out as Open Source if there's any interest. Hi - I am having a hard time imagining the use case for this. Why should you allow a link to determine things like the User-Agent header? HTTP headers are set by the client for a reason. Furthermore, as somebody involved in web archiving, I would like to ask you not to do this. It is already hard enough for us to tell that: http://example.org/HELLOWORLD is usually the same as: http://www.example.org/HELLOWORLD or: http://www.example.org/helloworld I don’t want to work in a world where this might be the same as: http://192.0.32.10/helloworld?HTTP-Host=example.org Apologies if this sounds hostile, and thanks for reading. best, Erik Hetzner Sent from my free software system http://fsf.org/. pgpC9To4fBtJW.pgp Description: PGP signature
Re: [CODE4LIB] Microsoft Zentity
At Wed, 28 Apr 2010 15:11:39 +0100, David Kane wrote: Andy, It is a highly extensible platform, based on .NET and windows. It is also open source! […] Here is the license: http://research.microsoft.com/en-us/downloads/48e60ac1-a95a-4163-a23d-28a914007743/Research-Output%20Repository%20Platform%20EULA%20%282008-06-06%29.txt This is not an open source license. best, Erik Hetzner ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpVPPJvvYS4q.pgp Description: PGP signature
Re: [CODE4LIB] Temporary redirection and the location bar
At Fri, 26 Feb 2010 10:00:15 -0500, Esme Cowles escow...@ucsd.edu wrote: One solution to this problem is to use a reverse proxy instead of a redirect. We do this for our ARKs, so temporary URL is not shown to the end user at all. This is not a general solution, especially for people who are redirecting externally and are concerned about the phishing scenario described in: http://www.w3.org/TR/2001/NOTE-cuap-20010206#cp-temp-redir I think the ideal solution would be to have the browser location bar show the original URL, with a conspicuous indication of redirection, which would provide access to the redirection chain and the final URL. Bookmarking would default to the original URL, but provide the option of using the final URL instead. Hi Esme - This is a great solution, as long as you control both sides. Thanks for pointing it out. Your solution for the browser is very close to one proposed at [1]. This bug is now 9 years old. I believe that there is some reluctance among browser authors to change the behavior at this point. If others on this list are interested in persistent identifiers, and you have some time, I think it would be worth your while to research this issue. It might be useful in the future to demonstrate that there are people who care about this issue. best, Erik Hetzner 1. https://bugzilla.mozilla.org/show_bug.cgi?id=68423#c11 ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpUfnYxI1wpL.pgp Description: PGP signature
[CODE4LIB] Temporary redirection and the location bar
Hi - This is an issue which is of great importance to persistent identifiers on the web, and one which I thought should be brought to the attention of the c4l community. It affects PURLs, ARKs, and in general any system that redirects a persistent or permanent URI to another, temporary URI. I did not, however, realize that there was active debate about it. Briefly, from [1]: 3.4 Do not treat HTTP temporary redirects as permanent redirects. The HTTP/1.1 specification [RFC2616] specifies several types of redirects. The two most common are designated by the codes 301 (permanent) and 302 or 307 (temporary): * A 301 redirect means that the resource has been moved permanently and the original requested URI is out-of-date. * A 302 or 307 redirect, on the other hand, means that the resource has a temporary URI, and the original URI is still expected to work in the future. The user should be able to bookmark, copy, or link to the original (persistent) URI or the result of a temporary redirect. Wrong: User agents usually show the user (in the user interface) the URI that is the result of a temporary (302 or 307) redirect, as they would do for a permanent (301) redirect. There is more info at [2]. You can find the email thread at [3]. best, Erik Hetzner 1. http://www.w3.org/TR/2001/NOTE-cuap-20010206#cp-temp-redir 2. http://www.w3.org/2001/tag/group/track/issues/57 3. http://www.w3.org/mid/760bcb2a1002231400m5e9b2bb6rc80bb43c37a81...@mail.gmail.com ---BeginMessage--- http://www.w3.org/2001/tag/2010/02/redirects-and-address-bar.txt Written in the style of a blog post, and making use of state-of-the-art theory (such as it is) of http semantics. If this gets review and approval of some kind (especially from TimBL, who has been the vocal campaigner on this question) I'll htmlify and post it and be done with this action. Not sure what else to do. Jonathan ---End Message--- pgpMRUclrgHer.pgp Description: PGP signature
Re: [CODE4LIB] Character problems with tictoc
At Mon, 21 Dec 2009 14:59:01 -0500, Glen Newton wrote: Thanks, Erik, some useful tools and advice. Glad to help! […] But I don't understand why Firefox was ignoring the Content-Type: text/plain; charset=utf-8 It should not be using the default charset (ISO-Latin 8859-1) for this content, as it has been told the text encoding is UTF-8... It seems to work fine in my version of Firefox (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.6) Gecko/20091215 Ubuntu/9.10 (karmic) Firefox/3.5.6), with latin-1 default. best, Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpQvfQeD04GX.pgp Description: PGP signature
Re: [CODE4LIB] web archiving - was: Implementing OpenURL for simple web resources
At Fri, 18 Sep 2009 10:40:08 -0400, Ed Summers wrote: Hi Erik, all […] I haven't been following this thread completely, but you've taken it in an interesting direction. I think you've succinctly described the issue with using URLs as references in an academic context: that the integrity of the URL is a function of time. As John Kunze has said: Just because the URI was the last to see a resource alive doesn't mean it killed them :-) I'm sure you've seen this, but Internet Archive have a nice URL pattern for referencing a resource representation in time: http://web.archive.org/web/{year}{month}{day}{hour}{minute}{seconds}/{url} So for example you can reference Google's homepage on December 2, 1998 at 23:04:10 with this URL: http://web.archive.org/web/19981202230410/http://www.google.com/ As Mike's email points out this is only good as long as Internet Archive is up and running the way we expect it to. Having any one organization shoulder this burden isn't particularly scalable, or realistic IMHO. But luckily the open and distributed nature of the web allows other organizations to do the same thing--like the great work you all are doing at the California Digital Library [1] and similar efforts like WebCite [2]. It would be kinda nice if these web archiving solutions sported similar URI patterns to enable discovery. For example it looks like: http://webarchives.cdlib.org/sw1jd4pq4k/http://books.nap.edu/html/id_questions/appB.html references a frame that surrounds an actual representation in time: http://webarchives.cdlib.org/wayback.public/NYUL_ag_3/20090320202246/http://books.nap.edu/html/id_questions/appB.html Which is quite similar to Internet Archive's URI pattern -- not surprising given the common use of Wayback [3]. But there are some differences. It might be nice to promote some URI patterns for web archiving services, so that we could theoretically create applications that federated search for a known resource at a given time. I guess in part OpenURL was designed to fill this space, but it might instead be a bit more natural to define a URI pattern that approximated what Wayback does, and come up with some way of sharing archive locations. I'm not sure if that last bit made any sense, or if some attempt at this has been made already. Maybe something to talk about at iPRES? I had hoped that the Zotero/InternetArchive collaboration would lead to some more integration between scholarly use of the web and archiving [3]. I guess there's still time? //Ed [1] http://webarchives.cdlib.org/ [2] http://www.webcitation.org/ [3] http://inkdroid.org/journal/2007/12/17/permalinks-reloaded/ Hi Ed, code4libbers - Sorry for the late reply, but I have been on vacation. Thanks for the insightful comments. They are very much in line with things I have been thinking and you have got me thinking along some other lines as well. Our system is based on crawls, so in your example sw1jd4pq4k is a crawl id. We discussed using the .../20090101.../http://.. scheme directly as in wayback, but decided to use crawl-based URLs as our primary mechanism of entry, given the constraints of our system. (By the way, the ...wayback.public... URL should not be relied on for permanence!) We would, however, like to support the use of wayback style URLs as well. There is some interest in the web archiving community of increasing interoperability between web archive systems, so that we can, for instance, direct a user to web.archive.org if we do not have a URL in our system, and vice versa. In terms of getting authors to cite archived material rather than live web material, there are many approaches to this that I can think of, for example: a) Encouraging authors to link to archive.org or other web archives rather than the live web; b) Creating services to allow authors to take snapshots of websites, like webcite, if necessary; c) Rewriting links in our system to point to archives, so that, for instance, the reference (taken from first google search for “mla website citation”, and, of course, broken): Lynch, Tim. DSN Trials and Tribble-ations Review. Psi Phi: Bradley's Science Fiction Club. 1996. Bradley University. 8 Oct. 1997 http://www.bradley.edu/campusorg/psiphi/DS9/ep/503r.html. would be rewritten to the working URL, based on the URL provided and the access time (8 Oct. 1997): http://web.archive.org/1997100800/http://www.bradley.edu/campusorg/psiphi/DS9/ep/503r.html d) Publicizing web archiving so that uses know that they can use tools like the web archive to find those broken links. e) Providing browser plugins so that users who follow 404ed links can be given the alternative of proceeding to an archived web site. best, Erik Hetzner ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpKgGuCp4dKB.pgp Description: PGP signature
Re: [CODE4LIB] Implementing OpenURL for simple web resources
At Wed, 16 Sep 2009 13:39:42 +0100, O.Stephens wrote: Thanks Erik, Yes - generally references to web sites require a 'route of access' (i.e. URL) and 'date accessed' - because, of course, the content of the website may change over time. Strictly you are right - if you are going to link to the resource it should be to the version of the page that was available at the time the author accessed it. This time aspect is something I'm thinking about more as a result of the conversations on this thread. The 'date accessed' seems like a good way of differentiating different possible resolutions of a single URL. Unfortunately references don't have a specified format for date, and they can be expressed in a variety of ways - typically you'll see something like 'Accessed 14 September 2009', but as far as I know it could be 'Accessed 14/09/09' or I guess 'Accessed 09/14/09' etc. It is also true that the intent of a reference can vary - sometimes the intent is to point at a website, and sometimes to point to the content of a website at a moment in time (thinking loosely in FRBR terms I guess you'd say that sometimes you want to reference the work/expression, and sometimes the manifestation? - although I know FRBR gets complicated when you look at digital representations, a whole other discussion) To be honest, our project is not going to delve into this too much - limited both by time (we finish in February) and practicalities (I just don't think the library/institution is going to want to look at snapshotting websites, or finding archived versions for each course we run - I suspect it would be less effort to update the course to use a more current reference in the cases this problem really manifests itself). One of the other things I've come to realise is that although it is nice to be able to access material that is referenced, the reference primarily recognises the work of others, and puts your work into context - access is only a secondary concern. It is perfectly possible and OK to reference material that is not generally available, as a reader I may not have access to certain material, and over time material is destroyed so when referencing rare or unique texts it may become absolutely impossible to access the referenced source. I think for research publications there is a genuine and growing issue - especially when we start to consider the practice of referencing datasets which is just starting to become common practice in scientific research. If the dataset grows over time, will it be possible to see the version of the dataset used when doing a specific piece of research? You might find the WebCite service [1] to be of some use. Of course it cannot work retroactively, so it is best if researchers use it in the first place. best, Erik Hetzner 1. http://www.webcitation.org/ ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpbj5M2lZ56Y.pgp Description: PGP signature
Re: [CODE4LIB] Implementing OpenURL for simple web resources
Hi Owen, all: This is a very interesting problem. At Tue, 15 Sep 2009 10:04:09 +0100, O.Stephens wrote: […] If we look at a website it is pretty difficult to reference it without including the URL - it seems to be the only good way of describing what you are actually talking about (how many people think of websites by 'title', 'author' and 'publisher'?). For me, this leads to an immediate confusion between the description of the resource and the route of access to it. So, to differentiate I'm starting to think of the http URI in a reference like this as a URI, but not necessarily a URL. We then need some mechanism to check, given a URI, what is the URL. […] The problem with the approach (as Nate and Eric mention) is that any approach that relies on the URI as a identifier (whether using OpenURL or a script) is going to have problems as the same URI could be used to identify different resources over time. I think Eric's suggestion of using additional information to help differentiate is worth looking at, but I suspect that this is going to cause us problems - although I'd say that it is likely to cause us much less work than the alternative, which is allocating every single reference to a web resource used in our course material it's own persistent URL. […] I might be misunderstanding you, but, I think that you are leaving out the implicit dimension of time here - when was the URL referenced? What can we use to represent the tuple URL, date, and how do we retrieve an appropriate representation of this tuple? Is the most appropriate representation the most recent version of the page, wherever it may have moved? Or is the most appropriate representation the page as it existed in the past? I would argue that the most appropriate representation would be the page as it existed in the past, not what the page looks like now - but I am biased, because I work in web archiving. Unfortunately this is a problem that has not been very well addressed by the web architecture people, or the web archiving people. The web architecture people start from the assumption that http://example.org/ is the same resource which only varies in its representation as a function of time, not in its identity as a resource. The web archives people create closed systems and do not think about how to store and resolve the tuple, URL, date. I know this doesn’t help with your immediate problem, but I think these are important issues. best, Erik Hetzner ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpoU4UofTFjn.pgp Description: PGP signature
Re: [CODE4LIB] Implementing OpenURL for simple web resources
At Mon, 14 Sep 2009 14:48:23 +0100, O.Stephens wrote: I'm working on a project called TELSTAR (based at the Open University in the UK) which is looking at the integration of resources into an online learning environment (see http://www.open.ac.uk/telstar for the basic project details). The project focuses on the use of References/Citations as the way in which resources are integrated into the teaching material/environment. We are going to use OpenURL to provide links (where appropriate) from references to full text resources. Clearly for journals, articles, and a number of other formats this is a relatively well understood practice, and implementing this should be relatively straightforward. However, we also want to use OpenURL even where the reference is to a more straightforward web resource - e.g. a web page such as http://www.bbc.co.uk. This is in order to ensure that links provided in the course material are persistent over time. A brief description of what we perceive to be the problem and the way we are tackling it is available on the project blog at http://www.open.ac.uk/blogs/telstar/2009/09/14/managing-link-persistence-with-openurls/ (any comments welcome). What we are considering is the best way to represent a web page (or similar - pdf etc.) in an OpenURL. It looks like we could do something as simple as: http://resolver.address/? url_ver=Z39.88-2004 url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx rft_id=http%3A%2F%2Fwww.bbc.co.uk Is this sufficient (and correct)? Should we consider passing fuller metadata? If the latter should we use the existing KEV DC representation, or should we be looking at defining a new metadata format? Any help would be very welcome. Here are some things that I would take into consideration, not related to the technical OpenURL question, but I think relevant anyhow. a) What will people do if the service that you provide goes away? A good thing about the OpenURL that you have above is that even if your resolver no longer works, a savvy user can see that the OpenURL is supposed to point at http://www.bbc.co.uk/. A bad thing about the old URL that you have on your blog: http://routes.open.ac.uk/ixbin/hixclient.exe?_IXDB_=routes_IXSPFX_=gsubmit-button=summary$+with+res_id+is+res9377 is that when that URL stops working - I will bet money it will stop working before www.bcc.co.uk stops working - nobody will know what it meant. b) How can you ensure that your service will not go away? What is the institutional commitment? If you can’t provide a stronger commitment than, e.g., www.bbc.co.uk, is this worth doing? c) Who will maintain that database that redirects www.bbc.co.uk to www.neobbc.co.uk? (see second part of B above). d) Is there a simpler solution to this problem than OpenURL? e) Finally: how many problems will this solve? It seems to me that this is only useful in the case of URL A1 moving to A2 (e.g., following an organization rename) where the organization does not maintain a redirect. In other words, it is not particularly useful in cases where URL A1 goes away completely (in which case there is no unarchived URL to go to) and where a redirect is maintained from A1 to A2 (in which case there is no need to maintain your own redirect). How many instances of this are there? Maybe there are many; www.bbc.co.uk is a bad example, but a journal article online might move around a lot. Hope that is useful! Thanks for reading. best, Erik Hetzner pgphclJkUh9ue.pgp Description: PGP signature
Re: [CODE4LIB] Recommend book scanner?
At Fri, 1 May 2009 09:51:19 -0500, Amanda P wrote: On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. Cameras around $100 dollars are very low quality. You could get no where near the dpi recommended for materials that need to be OCRed. The quality of images from cameras would be not only low, but the OCR (even with the best software) would probably have many errors. For someone scanning items at home this might be ok, but for archival quality, I would not recommend cameras. If you are grant funded and the grant provider requires a certain level of quality, you need to make sure the scanning mechanism you use can scan at that quality. I know very little about digital cameras, so I hope I get this right. According to Wikipedia, Google uses (or used) an 11MP camera (Elphel 323). You can get a 12MP camera for about $200. With a 12MP camera you should easily be able to get 300 DPI images of book pages and letter size archival documents. For a $100 camera you can get more or less 300 DPI images of book pages. * The problems I have always seen with OCR had much to do with alignment and artifacts than with DPI. 300 DPI is fine for OCR as far as my (limited) experience goes - as long as you have quality images. If your intention is to scan items for preservation, then, yes, you want higher quality - but I can’t imagine any setup for archival quality costing anywhere near $1000. If you just want to make scans full text OCR available, these setups seem worth looking at - especially if the software workflow can be improved. best, Erik * 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing pixels / 300). As long as you can get the camera close enough to the image to not waste much space you will be getting in the close to 300 DPI range for images of size 8.5 x 11 or less. ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgplxGVqVq0Xx.pgp Description: PGP signature
Re: [CODE4LIB] Recommend book scanner?
At Wed, 29 Apr 2009 13:32:08 -0400, Christine Schwartz wrote: We are looking into buying a book scanner which we'll probably use for archival papers as well--probably something in the $1,000.00 range. Any advice? Most organizations, or at least the big ones, Internet Archive and Google, seem to be using a design based on 2 fixed cameras rather than a tradition scanner type device. Is this what you had in mind? Unfortunately none of these products are cheap. Internet Archive’s Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because it has two very expensive cameras. Google’s data is unavailable. A company called Kirtas also sells what look like very expensive machines of a similar design. On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. I think that these are a real possibility for smaller organizations. The maturity of the software and workflow is problematic, but with Google’s Ocropus OCR software [4] freely available as the heart of a scanning workflow, the possibility is there. Both bkrpr and [3] have software currently available, although in the case of bkrpr at least the software is in the very early stages of development. best, Erik Hetzner 1. http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/ 2. http://bkrpr.org/doku.php 3. http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/ 4. http://code.google.com/p/ocropus/ ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpYI2WLVtxUI.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
At Thu, 2 Apr 2009 13:47:50 +0100, Mike Taylor wrote: Erik Hetzner writes: Without external knowledge that info:doi/10./xxx is a URI, I can only guess. Yes, that is true. The point is that by specifying that the rft_id has to be a URI, you can then use other kinds of URI without needing to broaden the specification. So: info:doi/10./j.1475-4983.2007.00728.x urn:isbn:1234567890 ftp://ftp.indexdata.com/pub/yaz [Yes, I am throwing in an ftp: URL as an identifier just because I can -- please let's not get sidetracked by this very bad idea :-) ] This is not just hypothetical: the flexibility is useful and the ecapsulation of the choice within a URI is helpful. I maintain an OpenURL resolver that handles rft_id's by invoking a plugin depending on what the URI scheme is; for some URI schemes, such as info:, that then invokes another, lower-level plugin based on the type (e.g. doi in the example above). Such code is straightforward to write, simple to understand, easy to maintain, and nice to extend since all you have to do is provide one more encapsulated plugin. Thanks for the clarification. Honestly I was also responding to Rob Sanderson’s message (bad practice, surely) where he described URIs as ‘self-describing’, which seemed to me unclear. URIs are only self-describing insofar as they describe what type of URI they are. I think that all of us in this discussion like URIs. I can’t speak for, say, Andrew, but, tentatively, I think that I prefer info:doi/10./xxx to plain 10.111/xxx. I would just prefer http://dx.doi.org/10./xxx (Caveat: I have no idea what rft_id, etc, means, so maybe that changes the meaning of what you are saying from how I read it.) No, it's doesn't :-) rft_id is the name of the parameter used in OpenURL 1.0 to denote a referent ID, which is the same thing I've been calling a Thing Identifier elsewhere in this thread. The point with this part of OpenURL is precisely that you can just shove any identifier at the resolver and leave it to do the best job it can. Your only responsibility is to ensure that the identifier you give it is in the form of a URI, so the resolver can use simple rules to pick it apart and decide what to do. Thanks. best, Erik Hetzner pgprSzdg7GAkN.pgp Description: PGP signature
Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)
Hi Ray - At Thu, 2 Apr 2009 13:48:19 -0400, Ray Denenberg, Library of Congress wrote: You're right, if there were a web: URI scheme, the world would be a better place. But it's not, and the world is worse off for it. Well, the original concept of the ‘web’ was, as I understand it, to bring together all the existing protocols (gopher, ftp, etc.), with the new one in addition (HTTP), with one unifying address scheme, so that you could have this ‘web browser’ that you could use for everything. So web: would have been nice, but probably wouldn’t have been accepted. As it turns out, HTTP won overwhelmingly, and the older protocols died off. It shouldn't surprise anyone that I am sympathetic to Karen's criticisms. Here is some of my historical perspective (which may well differ from others'). Back in the old days, URIs (or URLs) were protocol based. The ftp scheme was for retrieving documents via ftp. The telnet scheme was for telnet. And so on. Some of you may remember the ZIG (Z39.50 Implementors Group) back when we developed the z39.50 URI scheme, which was around 1995. Most of us were not wise to the ways of the web that long ago, but we were told, by those who were, that z39.50r: and z39.50s: at the beginning of a URL are explicit indications that the URI is to be resolved by Z39.50. A few years later the semantic web was conceived and alot of SW people began coining all manner of http URIs that had nothing to do with the http protocol. By the time the rest of the world noticed, there were so many that it was too late to turn back. So instead, history was altered. The company line became we never told you that the URI scheme was tied to a protocol. Instead, they should have bit the bullet and coined a new scheme. They didn't, and that's why we're in the mess we're in. Not knowing the details of the history, your account seems correct to me, except that I don’t think the web people tried to alter history. I think of the web of having been a learning experience for all of us. Yes, we used to think that the URI was tied to the protocol. But we have learned that it doesn’t need to be, that HTTP URIs can be just identifiers which happen to be dereferencable at the moment using the HTTP protocol. And it became useful to begin identifying lots of things, people and places and so on, using identifiers, and it also seemed useful to use a protocol that existed (HTTP), instead of coming up with the Person-Metadata Transfer Protocol and inventing a new URI scheme (pmtp://...) to resolve metadata about persons. Because HTTP doesn’t care what kind of data it is sending down the line; it can happily send metadata about people. But that is how things grow; the http:// at the beginning of a URI may eventually be a spandrel, when HTTP is dead and buried. And people will wonder why the address http://dx.doi.org/10./xxx has those funny characters in front of it. And doi.org will be long gone, because they ran out of money, and their domain was taken over by squatters, so we all had to agree to alter our browsers to include an override to not use DNS to resolve the dx.doi.org domain but instead point to a new, distributed system of DOI resolution. We will need to fix these problems as they arise. In my opinion, if we are interested in identifier persistent, clarity about the difference between things and information about things, creating a more useful web (of data), and the other things we ought to be interested in, our time is best spent worrying about these things, and how they can be built on top of the web. Our time is not well spent in coming up with new ways to do things that web already does for us. For instance: if there is concern that HTTP URIs are not seen as being persistent, it would be useful to try to add a method to HTTP which indicated the persistence of an identifier. This way browsers could display a little icon that indicated that the URI was persistent. A user could click on this icon and get information about the institution which claimed persistence for the URI, what the level of support was, what other institution could back up that claim, etc. Our time would not be well spent coming up with an elaborate scheme for phttp:// URIs, creating a better DNS, with name control by a better institution, and a better HTTP, with metadata, and a better caching system, and so on. This is a lot of work and you forget what you were trying to do in the first place, which is make HTTP URIs persistent. best, Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpOEgu0KFRiA.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
At Thu, 2 Apr 2009 19:29:49 +0100, Rob Sanderson wrote: All I meant by that was that the info:doi/ URI is more informative as to what the identifier actually is than just the doi by itself, which could be any string. Equally, if I saw an SRW info URI like: info:srw/cql-context-set/2/relevance-1.0 that's more informative than some ad-hoc URI for the same thing. Without the external knowledge that info:doi/xxx is a DOI and info:srw/cql-context-set/2/ is a cql context set administered by the owner with identifier '2' (which happens to be me), then they're still just opaque strings. Yes, info:doi/10./xxx is more easily recognizable (‘sniffable’) as a DOI than 10./xxx, both for humans and machines. If we don’t know, by some external means, that a given string has the form of some identifier, then we must guess, or sniff it. But it is good practice to use other means to ensure that we know whether or not any given string is an identifier, and if it is, what type it is. Otherwise we can get confused by strings like go:home. Was that a URI or not? That said, I see no reason why the URI: info:srw/cql-context-set/2/relevance-1.0 is more informative than the URI: http://srw.org/cql-context-set/2/relevance-1.0 As you say, both are just opaque URIs without the additional information. This information is provided by, in the first case, the info-uri registry people, or, in the second case, by the organization that owns srw.org. I could have said that http://srw.cheshire3.org/contextSets/rel/ was the identifier for it (SRU doesn't care) but that's the location for the retrieval documentation for the context set, not a collection of abstract access points. If srw.cheshire3.org was to go away, then people can still happily use the info URI with the continued knowledge that it shouldn't resolve to anything. If srw.cheshire3.org goes away, people can still happily use the http URI. (see below) With the potential dissolution of DLF, this has real implications, as DLF have an info URI namespace. If they'd registered a bunch of URIs with diglib.org instead, which will go away, then people would have trouble using them. Notably when someone else grabs the domain and starts using the URIs for something else. The original URIs are still just as useful as identifiers, they have become less useful as dereferenceable identifiers. Now if DLF were to disband AND reform, then they can happily go back to using info:dlf/ URIs even if they have a brand new domain. The info:dlf/ URIs would be the same non-dereferenceable URIs they always were, true. But what have we gained? The issue of persistence of dereferenceablity is a real one. There are solutions, e.g, other organizations can step in to host the domain; the ARK scheme; or, we can all agree that the diglib.org domain is too important to let be squatted, and agree that URIs that begin http://diglib.org/ are special, and should by-pass DNS. [1] I think that all of us in this discussion like URIs. I can’t speak for, say, Andrew, but, tentatively, I think that I prefer info:doi/10./xxx to plain 10.111/xxx. I would just prefer http://dx.doi.org/10./xxx info URIs, In My Opinion, are ideally suited for long term identifiers of non information resources. But http URIs are definitely better than something which isn't a URI at all. Something we can all agree on! URIs are better than no URIs. best, Erik 1. Take with a grain of salt, as this is not something I have fully thought out the implications of. ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgp4pFCxNEtYW.pgp Description: PGP signature
Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )
At Thu, 2 Apr 2009 11:34:12 -0400, Jonathan Rochkind wrote: […] I think too much of this conversation is about people's ideal vision of how things _could_ work, rather than trying to make things work as best as we can in the _actual world we live in_, _as well as_ planning for the future when hopefully things will work even better. You need a balance between the two. This is a good point. But as I see it, the web people - for lack of a better word - *are* discussing the world we live in. It is those who want to re-invent better ways of doing things who are not. HTTP is here. HTTP works. *Everything* (save one) people want to do with info: URIs or urn: URIs or whatever already works with HTTP. I can count one thing that info URIs possess that HTTP URIs don’t: the ‘feature’ of not ever being dereferenceable. And even that is up in the air - somebody could devise a method to dereference them at any time. And then where are you? […] a) Are as likely to keep working indefinitely, in the real world of organizations with varying levels of understanding, resources, and missions. Could somebody explain to me the way in which this identifier: http://suphoa5d.org/phae4ohg does not work *as an identifier*, absent any way of getting information about the referent, in a way that: info:doi/10.10.1126/science.298.5598.1569 does work? I don’t mean to be argumentative - I really want to know! I think there may be something that I am missing here. b) Are as likely as possible to be adopted by as many people as possible for inter-operability. Having an ever-increasing number of possible different URIs to represent the same thing is something to be avoided if possible. +1 c) Are as useful as possible for the linked data vision. +1 […] best, Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpTIK69UTZMm.pgp Description: PGP signature
Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )
Erik Hetzner writes: Could somebody explain to me the way in which this identifier: http://suphoa5d.org/phae4ohg does not work *as an identifier*, absent any way of getting information about the referent, in a way that: info:doi/10.10.1126/science.298.5598.1569 does work? A quick clarification - before I digest Mike’s thoughts - I didn’t mean to make a meaningless HTTP URI but a meaningful info URI. What I was trying to illustrate was a non-dereferenceable URI. So, for: http://suphoa5d.org/phae4ohg please read instead: http://defunctdois.org/10.10.1126/science.298.5598.1569 Thanks! best, Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpwlL93ehevk.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
At Wed, 1 Apr 2009 14:34:45 +0100, Mike Taylor wrote: Not quite. Embedding a DOI in an info URI (or a URN) means that the identifier describes its own type. If you just get the naked string 10./j.1475-4983.2007.00728.x passed to you, say as an rft_id in an OpenURL, then you can't tell (except by guessing) whether it's a DOI, a SICI, and ISBN or a biological species identifier. But if you get info:doi/10./j.1475-4983.2007.00728.x then you know what you've got, and can act on it accordingly. It seems to me that you are just pushing out by one more level the mechanism to be able to tell what something is. That is - before you needed to know that 10./xxx was a DOI. Now you need to know that info:doi/10./xxx is a URI. Without external knowledge that info:doi/10./xxx is a URI, I can only guess. (Caveat: I have no idea what rft_id, etc, means, so maybe that changes the meaning of what you are saying from how I read it.) -Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpRKlTtYU7Wa.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
At Fri, 27 Mar 2009 20:56:42 -0400, Ross Singer wrote: So, in a what is probably a vain attempt to put this debate to rest, I created a partial redirect PURL for sudoc: http://purl.org/NET/sudoc/ If you pass it any urlencoded sudoc string, you'll be redirected to the GPO's Aleph catalog that searches the sudoc field for that string. http://purl.org/NET/sudoc/E%202.11/3:EL%202 should take you to: http://catalog.gpo.gov/F/?func=find-cccl_term=GVD%3DE%202.11/3:EL%202 There, Jonathan, you have a dereferenceable URI structure that you A) don't have to worry about pointing at something misleading B) don't have to maintain (although I'll be happy to add whoever as a maintainer to this PURL) If the GPO ever has a better alternative, we just point the PURL at it in the future. Beautiful work, Ross. Thank you. best, Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpC8fHWXKSFo.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
At Mon, 30 Mar 2009 10:12:39 -0400, Ray Denenberg, Library of Congress wrote: Leaving aside religious issues I just want to be sure we're clear on one point: the work required for the info URI process is exactly the amount of work required, no more no less. It forces you to specify clear syntax and semantics, normalization (if applicable), etc. If you go a different route because it's less work, then you're probably avoiding doing work that needs to be done. Reading over your previous message regarding mapping SuDocs syntax to URI syntax, I completely agree about the necessity of clarifying these rules. But I was referring to the bureaucratic overhead (little thought it may be) in registering an info: URI. This overhead may or may not be useful, but it is there, including a submission process, internal review, public comments (according the draft info URI registry policy). -Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpz1Vry1WFt3.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
At Mon, 30 Mar 2009 13:58:04 -0400, Jonathan Rochkind wrote: It's interesting that there are at least three, if not four, viewpoints being represented in this conversation. The first argument is over whether all identifiers should be resolvable or not. While I respect the argument that it's _useful_ to have resolvable (to something) identifiers , I think it's an unneccesary limitation to say that all identifiers _must_ be resolvable. There are cases where it is infeasible on a business level to support resolvability. It may be for as simple a reason as that the body who actually maintains the identifiers is not interested in providing such at present. You can argue that they _ought_ to be, but back in the real world, should that stand as a barrier to anyone else using URI identifiers based on that particular identifier system? Wouldn't it be better if it didn't have to be? [ Another obvious example is the SICI -- an identifier for a particular article in a serial. Making these all resolvable in a useful way is a VERY non-trivial exersize. It is not at all easy, and a solution is definitely not cheap (DOI is an attempted solution; which some publishers choose not to pay for; both the DOI fees and the cost of building out their own infrastructure to support it). Why should we be prevented from using identifiers for a particular article in a serial until this difficult and expensive problem is solved?] So I don't buy that all identifiers must always be resolvable, and that if we can't make an identifier resolvable we can't use it. That excludes too much useful stuff. I don’t actually think that there is anybody who is arguing that all identifiers must be resolvable. There are people who argue that there are identifiers which must NOT be resolvable; at least in their basic form. (see Stuart Weibel [1]). […] best, Erik 1. http://weibel-lines.typepad.com/weibelines/2006/08/uncoupling_iden.html ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpuKdGTC0Mj7.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
At Mon, 30 Mar 2009 15:52:10 -0400, Jonathan Rochkind wrote: Erik Hetzner wrote: I don’t actually think that there is anybody who is arguing that all identifiers must be resolvable. There are people who argue that there are identifiers which must NOT be resolvable; at least in their basic form. (see Stuart Weibel [1]). There are indeed people arguing that, Erik, on this very list. Like, in the email I responded to (did you read that one?). That's why I wrote what I did, man! You know I'm the one who cited Stu's argument first on this list! I am aware of his arguments. I am aware of people arguing various things on this issue. My apologies for missing Andrew’s argument and not pointing out that you had originally pointed to Stuart’s argument. But when did someone suggest that all identifiers must be resolvable? When Andrew argued that: Having unresolvable URIs is anti-Web since the Web is a hypertext system where links are required to make it useful. Exposing unresolvable links in content on the Web doesn't make the Web more useful. Okay, I guess he didn't actually SAY that you should never have non-resolvable identifiers, but he rather strongly implied it, by using the anti-Web epithet. Given Andrew’s later response, I would like to restate my previous argument: I don’t [] think that there is anybody who is +seriously+ arguing that all identifiers must be resolvable +to be useful as identifiers+. best, Erik ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgps01lTF1mj0.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
At Fri, 27 Mar 2009 15:36:43 -0400, Jonathan Rochkind wrote: Thanks Ray. Oh boy, I don't know enough about SuDoc to describe the syntax rules fully. I can spend some more time with the SuDoc documentation (written for a pre-computer era) and try to figure it out, or do the best I can. I mean, the info registration can clearly point to the existing SuDoc documentation and say one of these -- but actually describing the syntax formally may or may not be possible/easy/possible-for-me-personally. I can't even tell if normalization would be required or not. I don't think so. I think SuDocs don't suffer from that problem LCCNs did to require normalization, I think they already have consistent form, but I'm not certain. I'll see what I can do with it. But Ray, you work for 'the government'. Do you have a relationship with a counter-part at GPO that might be interested in getting involved with this? Hi Jonathan - Obviously I don’t know your requirements, but I’d like to suggest that before going down the info: URI road, you read the W3C Technical Architecture Group’s finding ‘URNs, Namespaces and Registries’ [1]. | Abstract | This finding addresses the questions When should URNs or URIs with | novel URI schemes be used to name information resources for the | Web? and Should registries be provided for such identifiers?. The | answers given are Rarely if ever and Probably not. Common | arguments in favor of such novel naming schemas are examined, and | their properties compared with those of the existing http: URI | scheme. | Three case studies are then presented, illustrating how the http: | URI scheme can be used to achieve many of the stated requirements | for new URI schemes. best, Erik Hetzner 1. http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpvBsZoxJDPh.pgp Description: PGP signature
Re: [CODE4LIB] registering info: uris?
At Fri, 27 Mar 2009 17:18:24 -0400, Jonathan Rochkind wrote: I am not interested in maintaining a sudoc.info registration, and neither is my institution, who I wouldn't trust to maintain it (even to the extent of not letting the DNS registration expire) after I left. I think even something as simple as this really needs to be committed to by an organization. So yeah, even willing to take on the responsibility of owning that domain until such time as something useful can be done with it, I do not have, and to me that seems like a requirement, not just a nice to have. I see your point. I believe that registering a domain would be less work than going through an info URI registration process, but I don’t know how difficult the info URI registration process would be (thus bringing the conversation full circle). [1] But it certainly is another option. I feel like most people have the _expectation_ of http resolvability for http URIs though, even though it isn't actually required. If you want there to be an actual http server there at ALL, even one that just responds to all requests with a link to the SuDoc documentation, that's another thing you need. I think there is a strong expectation that if I resolve a URI, I do not end up with a domain squatter. Otherwise I am not so sure what is expected when using an HTTP URI whose primary purpose is identification, not dereferencing. Personally I would be happy to get either a page telling me to check back later [2], or nothing at all. best, Erik Hetzner 1. My last word on this. Because I am already beating a dead horse, I have put it in a footnote. For $100 and basically no time at all you can have 10 years of sudoc.info. If it takes an organization more than 2 or 3 hours of work to register an info: URI, then domain registration is a better deal, as I see it. 2. http://lccn.info/2002022641 ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpLGEdroPmog.pgp Description: Digital Signature
Re: [CODE4LIB] Linux Public Computers - time and ticket reservation system
At Mon, 5 Jan 2009 11:02:31 -0500, Darrell Eifert deif...@hampton.lib.nh.us wrote: Actually, I meant 'free' in both senses, but mostly in the sense of 'free of charge'. Thanks for the clarification. In that case I have to agree with Karen. Free (as in beer) software tends to be a property that results from the principles of free (as in speech) software, but it is not an goal in itself of most free/open source software developers. I hate to be blunt, but I think it's pretty safe to say that Ubuntu, Koha, GIMP, OpenOffice, Joomla and even the option of Linux itself would never exist or have gained traction and a developer base if these products were not freely available. Probably - but they certainly never would have gained a developer base if they were not free in the sense of having the source code available, and allowing modifications. Freedom is more important to community building than giving the software away without cost. Groovix and Userful are selling proprietary public-use computer management packages at a higher cost than their XP equivalents. If an open source LTSP solution were available under Linux (as in the Edubuntu package for schools) I would be much happier about recommending Linux as a solution for public-use computers in small to medium-sized independent public libraries. Again, I would invite those interested in providing help on this project to look at the feature list of 'Time Limit Manager' from Fortres -- that's what I want in an LTSP package. (As an analogy, remember that Koha was once just an idea floating around in some idealistic New Zealander's head.) http://www.fortresgrand.com/products/tlm/tlm.htm Groovix claims to be GPLed, though they do not make it easy to get the software. Here is some info: http://wiki.groovix.org/index.php?title=GroovixSoftwareInstaller best, Erik pgpzjohkt6SmU.pgp Description: PGP signature
Re: [CODE4LIB] COinS in OL?
At Mon, 1 Dec 2008 08:15:24 -0800, Raymond Yee wrote: Having COinS embedded in the Open Library would be useful. Zotero would have made use of such COinS -- but because they were absent, a custom translator was written to grab the bibliographic metadata from OL. Zotero also supports Unapi, which in my opinion is a much better system for getting bibliographic metadata from web sites than COinS. best, Erik Hetzner ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgptmLUp83fHi.pgp Description: PGP signature
Re: [CODE4LIB] djatoka
At Tue, 18 Nov 2008 06:13:46 -0500, Ed Summers [EMAIL PROTECTED] wrote: Thanks for bringing this up Erik. It really does seem to be preferable to me to treat these tiles as web resources in their own right, and to avoid treating them like resources that need to be routed to with OpenURL. It is also seems preferable to leverage RESTful practices like using the Accept header. I wonder if it would improve downstream cache-ability to push parts of the query string into the path of the URL, for example: http://an.example.org/ds/CB_TM_QQ432/4/0/899/1210/657/1106 Which could be documented with a URI template [1]: http://an.example.org/ds/{id}/{level}/{rotate}/{y}/{x}/{height}/{width} I guess I ought to read the paper (and refresh my knowledge of http caching) to see if portions of the URI would need to be optional, and what that would mean. Still, sure is nice to see this sort of open source work going on around jpeg2000. My nagging complaint about jpeg2000 as a technology is the somewhat limited options it presents tool wise ... and djatoka is certainly movement in the right direction. It might improve cache-ability: my understanding (not checking sources here) is that many caches do not cache GETs to URIs with query parts, although it is allowed. However: query parameter order does matter, so an explicitly ordered URI template could certainly prevent the problem of: http://example.org/?a=1b=2 being considered a different resource than: http://example.org/?b=2a=1 If you read rest-discuss, there have been discussions of image manipulation with URI query parameters/paths. http://article.gmane.org/gmane.comp.web.services.rest/6699 http://article.gmane.org/gmane.comp.web.services.rest/8167 There seem to be advantages to both methods (query parameters/paths). There is the further possibility of using path parameters [1], which seems a pretty natural fit, but not widely used: http://an.example.org/ds/{id};level={level};rotate={rotate};y={y};x={x};height={height};width={width} Additionally, I think that reading about how Amazon does (mostly) the same thing would be useful: http://www.aaugh.com/imageabuse.html I think that the library community could contribute to possible work in standardizing, to some extent, image manipulation with URIs; but I do feel that using OpenURL will slow or prevent uptake. best, Erik Hetzner 1. http://www.w3.org/DesignIssues/Axioms.html#matrix ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgp8rzRb0n1cE.pgp Description: PGP signature
Re: [CODE4LIB] djatoka
At Fri, 14 Nov 2008 06:10:45 -0500, Birkin James Diana [EMAIL PROTECTED] wrote: Yesterday I attended a session of the DLF Fall Forum at which Ryan Chute presented on djatoka, the open-source jpeg2008 image-server he and Herbert Van de Sompel just released. It's very cool and near the top of my crowded list of things to play with. If any of you have had the good fortune to experiment with it or implement it into some workflow, get over to the code4libcon09 presentation-proposal page pronto! And if you're as jazzed about it as I am, and know it'll be as big in our community as I think it will, consider a pre-conf proposal, too. Hi - This is a very cool tool. I am glad to see JPEG2k stuff hitting the open source world. Very nice! That said - It would be nice if somebody could make this work without OpenURL. Frankly I would much prefer the normal URI: http://an.example.org/ds/CB_TM_QQ432?level=4rotate=0y=899x=1210h=657w=1106 [1] to the OpenURL: http://an.example.org/djatoka/resolver? url_ver=Z39.88-2004 rft_id=info:lanl-repo/ svc_id=info:lanl-repo/svc/getRegion svc_val_fmt=info:ofi/fmt:kev:mtx:jpeg2000 svc.format=image/jpeg svc.level=4 svc.rotate=0 svc.region=899,1210,657,1106 and - so does the web, generally, consider that nobody uses OpenURL. I notice also that the example ajax tool put a duplicate URI box in the lower left hand corner for permanent URIs. It would be nice to have a ‘bookmark this’ type link - as in google maps, if the current bookmarkable URI is not going to be reflected in the location bar. best, Erik 1. I have left out the HTTP Accept header, part of the HTTP request but not part of the URI which is a more expressive replacement for the svc.format=image/jpeg parameter. pgpUeBs0lMEnp.pgp Description: PGP signature
Re: [CODE4LIB] Code4lib mugs?
At Mon, 3 Nov 2008 13:31:18 -0500, jean rainwater [EMAIL PROTECTED] wrote: I think the mugs are a great idea -- and thank you for your sponsorship!!! For myself, all the logoified travel mugs, t-shirts, usb keys, etc. I get clutter up my home until I finally get around to getting rid of them. Why not use the upwards of $700 that (my estimate) this will cost to sponsor another scholarship, or just to lower the cost of attendance? best, Erik Hetzner ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpFHAk8yKe39.pgp Description: PGP signature
Re: [CODE4LIB] code to guide installation of software?
At Thu, 9 Oct 2008 14:05:06 -0400, Ken Irwin [EMAIL PROTECTED] wrote: Hi folks, I've got a homegrown piece of software that I'll be presenting at a conference in a few weeks (to track title call-number request histories using III's InnReach module). I'm trying to package it up in such a way that other users will be able to use the software too, and I've never done this before. Is there any open-source or otherwise freely-available software to handle the installation of a LAMP-type product: - displaying readme type information until everything's set up - creating databases - creating data tables (in this case, with a dynamic list of fields depending on some user input) - loading up some pre-determined data into database tables - editing the config file variables I could make this up myself, but I wonder if someone has genericized this process. (I'm particularly concerned about how to effectively pre-load the data tables, not assuming the user has command-line mysql access.) This is pretty generic advice, but you should have a look at Karl Fogel’s book, Producing open source software, available online [1], particularly the chapter on ‘Packaging’. This provides a somewhat high-level view of the mechanics of packaging free software for release. It will not help with writing scripts to set up databases, which you will probably have to do by hand. best, Erik Hetzner 1. http://producingoss.com/ ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpW4inUYxjwK.pgp Description: PGP signature
Re: [CODE4LIB] anyone know about Inera?
At Sat, 12 Jul 2008 10:46:06 -0400, Godmar Back [EMAIL PROTECTED] wrote: Min, Eric, and others working in this domain - have you considered designing your software as a scalable web service from the get-go, using such frameworks as Google App Engine? You may be able to use Montepython for the CRF computations (http://montepython.sourceforge.net/) I know Min offers a WSDL wrapper around their software, but that's simply a gateway to one single-machine installation, and it's not intended as a production service at that. Thanks for the link to montepython. It looks like it might be a good tool for me to learn more about machine learning. As for my citation metadata extractor, once the training data is generated it would be trivial to scale it; there is no shared state. All that is really needed is an implementation of the Viterbi algorithm, there is one (in pure Python) on the wikipedia page; it is about 20 lines of code. So presumably it could be scaled on the Google app engine pretty easily. But it could be scaled on anything pretty easily; all you need is a load balancer and however many servers are necessary (not many, I would think). best, Erik Hetzner ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpDckRg5SWMS.pgp Description: PGP signature
Re: [CODE4LIB] anyone know about Inera?
At Fri, 11 Jul 2008 14:55:18 -0500, Steve Oberg [EMAIL PROTECTED] wrote: One example: Here's the citation I have in hand: Noordzij M, Korevaar JC, Boeschoten EW, Dekker FW, Bos WJ, Krediet RT et al. The Kidney Disease Outcomes Quality Initiative (K/DOQI) Guideline for Bone Metabolism and Disease in CKD: association with mortality in dialysis patients. American Journal of Kidney Diseases 2005; 46(5):925-932. Here's the output from ParsCit. Note the problem with the article title: […] The output is a little different from what I get from the parsCit web service. The parsCit authors recently published a new paper on a new version of their systems with a new engine, which you might want to look at [1]. There's more but basically it isn't accurate enough. It's very good but not good enough for what I need at this juncture. OpenURL resolvers like SFX are generally only as good as the metadata they are given to parse. I need a high level of accuracy. Maybe that's a pipe dream. I doubt that the software provided by Inera performs better than parsCit. Inera does find a DOI for that citation but that is not nearly so hard as determining which parts of a citation are which. parsCit is pretty cutting edge provides some of the best numbers I have seen. The Flux-CiM system [2] also has pretty good numbers, but the code for it is not available. I’ve also done a little bit of work on this, which you might want to have a look at. [3] One of the problems may be that the parsCit you are dealing with has been trained on the Cora dataset of computer science citations. It is a reasonably heterogeneous dataset of citations but it doesn’t have a lot that looks like that health sciences format. If your citations are largely drawn from the health sciences you might see about training it on a health sciences dataset; you will probably get much better results. best, Erik Hetzner 1. Isaac G. Councill, C. Lee Giles, Min-Yen Kan. (2008) ParsCit: An open-source CRF reference string parsing package. In Proceedings of the Language Resources and Evaluation Conference (LREC 08), Marrakesh, Morrocco, May. Available from http://wing.comp.nus.edu.sg/parsCit/#p 2. Eli Cortez C. Vilarinho, Altigran Soares da Silva, Marcos André Gonçalves, Filipe de Sá Mesquita, Edleno Silva de Moura. FLUX-CIM: flexible unsupervised extraction of citation metadata. In Proceedings of the 8th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2007), pp. 215-224. 3. A simple method for citation metadata extraction using hidden Markov models. In Proc. of the Joint Conf. on Digital Libraries (JCDL 2008), Pittsburgh, Pa., 2008. http://gales.cdlib.org/~egh/hmm-citation-extractor/ ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgp64luKWEnmY.pgp Description: PGP signature
[CODE4LIB] Project Manager position at CDL in digital preservation group
(Forwarded; please direct inquiries to [EMAIL PROTECTED]) UNIVERSITY OF CALIFORNIA, CALIFORNIA DIGITAL LIBRARY TITLE: Digital Preservation Services Manager CATEGORY: Full-Time SALARY: Salary commensurate with qualifications and experience. Excellent benefits. TO APPLY: http://jobs.ucop.edu/applicants/Central?quickFind=52447 POSITION DESCRIPTION: Want to be part of a dynamic team that is working to preserve digital information for future generations? At the California Digital Library (CDL), we've developed a world-class program to preserve digital material that supports the University of California's research, teaching, and learning mission and you can be a part of it. A key member of the team is the Digital Preservation Services Manager -- reporting to the Director of the Digital Preservation Program the Manager is responsible for the day-to-day management of digital preservation services (production and development) through project management, the provision of support services (whether offered in person or online), and liaison with digital preservation service providers and support staff. In addition, the Services Manager will be responsible for translating experience of users' needs and perceptions of system capabilities in a manner that informs further refinement and extension of the digital preservation technology and service infrastructure. This is an ideal opportunity for someone with solid people skills and a passion for working in a collaborative and dynamic environment. The California Digital Library (CDL) supports the assembly and creative use of the world's scholarship and knowledge for the UC libraries and the communities they serve. In partnership with the UC libraries, the California Digital Library established the digital preservation program to ensure long-term access to the digital information that supports and results from research, teaching and learning at UC. JOB REQUIREMENTS: Bachelor's degree in the social sciences, public administration, library and information science or a related field and at least three years' relevant experience with development or delivery of online information services in educational, digital preservation, library, research, and/or cultural heritage settings or an equivalent combination of education and experience. Demonstrated experience to plan, evaluate, budget for and manage complex projects from their inception through to their final delivery. Plans projects and assignments and monitors performance according to priorities as demonstrated by regularly meeting established deadlines in an environment of multiple projects and changing priorities. Strong logic and quantitative reasoning skills as demonstrated by ability to review and assess a range of variables to define key issues, evaluate reasonable alternatives and translate findings into recommended changes, actions or strategies. Proven experience with and general understanding of the academic user community and the digital library/scholarly information services domain. Demonstrated experience working with user community and technology/programming staff to build use cases, functional requirements and user interface design. Excellent written and verbal communication skills as demonstrated by the ability to understand and articulate technical ideas and issues at a conceptual level and explain them clearly and concisely to non-technical staff. Demonstrated ability to operate under general direction, able to develop creative solutions to problems, and tackle issues in a self-motivated manner in a service-oriented geographically distributed team environment. Demonstrated ability to plan, evaluate, budget for and manage complex projects from their inception through to their final delivery. Please don't hesitate to contact me if you have any questions about the position. Patricia Cruse Director, Digital Preservation Program California Digital Library University of California 510/987-9016 pgpfUIW751boR.pgp Description: PGP signature
Re: [CODE4LIB] what's the best way to get from Portland to San Francisco on Feb 28?
At Wed, 20 Feb 2008 19:35:45 -0800, Reese, Terry [EMAIL PROTECTED] wrote: You'll want to fly. On the West Coast, taking the train is a bit of a crap shoot and wouldn't advise it unless you had a day between when you are suppose to arrive and when you need to arrive. The few times I've taken Amtrak on the West coast between Seattle and Los Angelos, I've never been on time. I've been anywhere between 5 hours to one day late depending the distance needed to travel. In fact, given my past experience, if I wasn't going to fly -- I would drive. It will take you approximately 12-13 hours to drive down I-5 from Portland to San Francisco. By train, almost twice as long. Terry is right. Trains here are useless. The Greyhound will take less time (!), but lowest cost fare is only around half the cost of a plane ride, and refundable is comparable. If you can get a car or hitch a ride I-5 is not a great trip but fast enough: 1/101 takes a bit longer but at least is a pretty nice drive through CA (don’t know about Oregon). If you do fly, Southwest flies from Portland to Oakland for a good price, Oakland is just a Bart ride from SF. best, Erik Hetzner ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpvO2hUpLHSQ.pgp Description: PGP signature
Re: [CODE4LIB] theinfo.org: for people who work with big data sets
At Tue, 15 Jan 2008 12:08:23 -0800, Aaron Swartz [EMAIL PROTECTED] wrote: Hi code4libbers! As part of my work on Open Library, I've been doing what I expect a lot of you find yourself doing: collecting big batches of MARC records, testing algorithms for processing them, building interesting ways to visualize them. And what I've found is that while the community of other people doing this in libraries is really valuable, I also have a lot to learn from people who do this sort of thing with other types of data. So I'm trying to build a code4lib-style community around people who work with large data sets of all kinds: http://theinfo.org/ I hope that you'll take a look and join the mailing lists and get involved. I think that there's a lot we could do together. Hi Aaron al. Looks like a great project. Thanks also for plugging the WARC format. I added a bit to the wiki on this. I have a bit of trouble differentiating this from the Linking Open Data project[1]. Perhaps some info on the wiki about this would be helpful. best, Erik Hetzner 1. http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpGvb167wtCL.pgp Description: PGP signature
Re: [CODE4LIB] open source chat bots?
At Mon, 3 Dec 2007 10:14:29 -0500, Andrew Nagy [EMAIL PROTECTED] wrote: Hello - there was quite a bit of talk about chat bots a year or 2 back. I was wondering if anyone knew of an open source chat bot that works with jabber? There is a program called bitlbee that implements a jabber/aim/etc to irc gateway. If you used this you might then be able to use the vast universe of free/libre IRC bots. best, Erik Hetzner ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3 pgpyDwijXzOHE.pgp Description: PGP signature
Re: [CODE4LIB] code.code4lib.org
At Mon, 13 Aug 2007 12:25:58 -0400, Gabriel Farrell [EMAIL PROTECTED] wrote: In #code4lib today we discussed for a bit the possibility of setting up something on code4lib.org for code hosting. The project that spurred the discussion is Ed Summer's pymarc. The following is what I would like to see: * projects live at code.code4lib.org, so pymarc, for example, would be at code.code4lib.org/pymarc * svn for version control * trac interface for each * hosted at OSU with the rest of code4lib.org, for now What will this offer that sf.net, codehaus.org, nongnu.org, savannah.gnu.org, code.google.com, gna.org, belios.de, etc. don’t? Why not simply link to http://en.wikipedia.org/wiki/Comparison_of_free_software_hosting_facilities and let people decide which they prefer? Other people mentioned the sharing of code snippets; a wiki works best for sharing code snippets, examples, single file source. See http://emacswiki.org/ for a lively example. best, Erik Hetzner pgpaO9rEiQ83t.pgp Description: PGP signature