Re: [CODE4LIB] online book price comparison websites?

2014-02-26 Thread Erik Hetzner
I’ve liked bookfinder, but haven’t used it for a while.

-Erik

 At Wed, 26 Feb 2014 15:19:00 -0500,
Stephanie P Hess wrote:
 
 Try http://www.addall.com/. I used it all the time in my former incarnation
 as an Acquisitions Librarian.
 
 Cheers,
 
 Stephanie
 
 
 On Wed, Feb 26, 2014 at 3:14 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 
  Anyone have any recommendations of online sites that compare online prices
  for purchasing books?
 
  I'm looking for recommendations of sites you've actually used and been
  happy with.
 
  They need to be searchable by ISBN.
 
  Bonus is if they have good clean graphic design.
 
  Extra bonus is if they manage to include shipping prices in their price
  comparisons.
 
  Thanks!
 
  Jonathan
 
 
 
 
 -- 
 
 *Stephanie P. Hess*
 
 Electronic Resources Librarian
 
 Binghamton University
 
 Glenn G. Bartle Library
 
 4400 East Vestal Parkway
 
 Vestal, NY 13902
 
 
 
 607-777-2474

-- 
Sent from my free software system http://fsf.org/.


Re: [CODE4LIB] Python CMSs

2014-02-13 Thread Erik Hetzner
At Thu, 13 Feb 2014 15:13:58 -0900,
Coral Sheldon-Hess wrote:
 
 Hi, everyone!
 
 I've gotten clearance to totally rewrite my library's website in the
 framework/CMS of my choice (pretty much :)). As I have said on numerous
 occasions, If I can get paid to write Python, I want to do that! So,
 after some discussion with my department head/sysadmin, we're leaning
 toward Django.

Hi Coral,

My two cents:

I think of Django as a CMS construction kit. (Keep in mind that it was
originally developed for the newspaper business.) It’s probably more
complicated to set up than drupal, or a CMS built with Django, but I
would guess that the time you save with the CMS will be more than made
up for later on, when you want to make it do something it wasn’t
intended to.

With its out of the box admin interface and an HTML editor, you have
something that admins can use to generate content with hardly any work
on your part.

Basically - and this is my personal opinion, with only somewhat
limited CMS experience - the CMS experiment has failed. The dream of
something that is “customized”, not programmed, has never succeeded
and never will.

Django allow you to put together a “CMS” that fits your *exact* needs,
with a little more up-front work, rather than something that requires
less up-front work, but a lot more down the road.

best, Erik

-- 
Sent from my free software system http://fsf.org/.


Re: [CODE4LIB] Code4lib 2014 Diversity Scholarships: Call for Applications

2013-11-25 Thread Erik Hetzner
Hi all,

I can’t believe we are having this conversation again.

I have nothing to add except to say that rather than feed the troll,
you might do what I did, and turn your frustration at this thread
arising *once again* into a donation to the Ada Initiative or similar
organization. Sadly, it seems that one cannot contribute to the
diversity scholarships, as I would be happy to do so. If anybody knows
how, please let me know.

best, Erik


Re: [CODE4LIB] Tool for feedback on document

2013-10-16 Thread Erik Hetzner
At Wed, 16 Oct 2013 11:06:02 -0700,
Walker, David wrote:
 
 Hi all,
 
 We're looking to put together a large policy document, and would
 like to be able to solicit feedback on the text from librarians and
 staff across two dozen institutions.
 
 We could just do that via email, of course. But I thought it might
 be better to have something web-based. A wiki is not the best
 solution here, as I don't want those providing feedback to be able
 to change the text itself, but rather just leave comments.
 
 My fall back plan is to just use Wordpress, breaking the document up
 into various pages or posts, which people can then comment on. But
 it seems to me there must be a better solutions here -- maybe one
 where people can leave comments in line with the text?

Hi David,

For the GPLv3 process, the Free Software Foundation developed a web
application named stet for annotating and commenting on a text.
Apparently the successor to that is considered co-ment [1] which has a
gratis “lite” version [2]. That might solve your need. I’ve never
tried it.

best, Erik

1. http://www.co-ment.com/
2. https://lite.co-ment.com/
Sent from my free software system http://fsf.org/.


Re: [CODE4LIB] [CODE4LIB] HEADS UP - Government shutdown will mean *.loc.gov is going offline October 1

2013-09-30 Thread Erik Hetzner
At Mon, 30 Sep 2013 15:31:40 -0500,
Becky Yoose wrote:
 
 FYI - this also means that there's a very good chance that the MARC
 standards site [1] and the Source Codes site [2] will be down as well. I
 don't know if there are any mirror sites out there for these pages.
 
 Thanks,
 Becky, about to be (forcefully) departed with her standards documentation

Hi Becky,

Well, there’s always archive.org:

http://web.archive.org/web/20130816154112/http://www.loc.gov/marc/

best, Erik
Sent from my free software system http://fsf.org/.


Re: [CODE4LIB] A question about voting points

2013-04-01 Thread Erik Hetzner
At Mon, 1 Apr 2013 12:01:13 -0400,
David J. Fiander wrote:
 
 So, I just voted for the Code4Lib 2014 location. There are two possible
 venues, and I was given three points to apportion however I wish.
 
 While having multiple votes, to spread around at will, makes a lot of
 sense, shouldn't the number of votes each elector is granted be limited
 to max(3, count(options)-1)? That is, when voting for a binary, I get
 one vote, when voting on a choice of three items, I get two votes, and
 for anything more than three choices, I get three votes?
 
 I mean, realistically, one could give one vote to Austin and two votes
 to Raleigh, but why bother?

Hi David,

You actually can vote 0-3 on any option, for as many total votes as
you like.

The optimal strategy, assuming that you actually prefer one option to
another, is to vote 3 for the option you prefer and 0 for all others.

To slightly change the subject, systems are a policy decision, not a
technical problem. In the case of voting for presentations (more
important to me that conference location), different voting systems
will generate a different mix of presentations. Think of the
difference between the American congress and a parliamentary system.

The question is, does code4lib want conference presentations that are
more “first past the post” [1] or more representative of the diversity
of interests of the code4lib crowd (like a parliamentary system). The
existing system reduces to a first past the post system, which means
that the presentations which more people prefer win, rather than
presentations that a smaller group of people might feel strongly
about.

This is a question that shouldn’t be decided by the technology; the
policy should decide the technology. A google form might work, and
certainly hand-counted emailed votes would, given the relative
smallness of the c4l community.

Those who are interested can read more here:

  http://en.wikipedia.org/wiki/Voting_system

best, Erik

1. http://en.wikipedia.org/wiki/First-past-the-post_voting
Sent from my free software system http://fsf.org/.


pgpG8Iemj1bXJ.pgp
Description: PGP signature


Re: [CODE4LIB] GitHub Myths (was thanks and poetry)

2013-02-21 Thread Erik Hetzner
At Thu, 21 Feb 2013 10:29:28 -0500,
Shaun Ellis wrote:
 
 If you read my email, I don't tell anyone what to use, but simply 
 attempt to clear up some fallacies.  Distributed version control is new 
 to many, and I want to make sure that folks are getting accurate 
 information from this list.

Once again, these are not “fallacies”: they are disagreements.

 […]

 Pull-requests are used by repository hosting platforms to make it easier 
 to suggest patches.  GitHub and BitBucket both use the pattern, and I 
 don't understand what you mean by it being a closed tool.  If you're 
 concerned about barriers to entry, suggesting a patch using only git 
 or mercurial can be done, but I wouldn't say it's easy.
 
 ... and what Devon said.

An open tool is Internet email: I can send an email from my provider
(ucop.edu) to yours (princeton.edu). A closed tool is github, where I
need a github account to send you a pull request. An open tool would
be one where I can send a pull request bitbucket to github.
(Obviously, bitbucket is as closed as github in this regard.)

best, Erik
Sent from my free software system http://fsf.org/.


Re: [CODE4LIB] GitHub Myths (was thanks and poetry)

2013-02-20 Thread Erik Hetzner
At Wed, 20 Feb 2013 11:20:33 -0500,
Shaun Ellis wrote:
 
   (As a general rule, for every programmer who prefers tool A, and says
   that everybody should use it, there’s a programmer who disparages tool
   A, and advocates tool B. So take what we say with a grain of salt!)
 
 It doesn't matter what tools you use, as long as you and your team are 
 able to participate easily, if you want to.  But if you want to attract 
 contributions from a given development community, then choices should 
 be balanced between the preferences of that community and what best 
 serve the project.

It does matter what tools you use, which is why people are so
passionate about them. But I agree completely that you need to balance
the preferences of the community.

 From what I've been hearing, I think there is a lot of confusion
 about GitHub. Heck, I am constantly learning about new GitHub
 features, APIs, and best practices myself. But I find it to be an
 incredibly powerful platform for moving open source, distributed
 software development forward. I am not telling anyone to use GitHub
 if they don't want to, but I want to dispel a few myths I've heard
 recently:

It’s not confusion; and these aren’t “myths”: they are disagreements.

best, Erik
Sent from my free software system http://fsf.org/.


pgpB5ekrOeqHs.pgp
Description: PGP signature


Re: [CODE4LIB] GitHub Myths (was thanks and poetry)

2013-02-20 Thread Erik Hetzner
At Wed, 20 Feb 2013 11:50:45 -0800,
Tom Johnson wrote:
 
  but it would be difficult to replace the social network around the
 projects.
 
 Especially difficult now that GitHub is where the community is. It's
 technically possible to build a social web that works on a decentralized
 basis, but it may no longer be culturally possible. Platforms are hard to
 get down from.

Maybe. Most people today use internet email, not Compuserve email;
they use the web, not AOL keywords; and they use jabber/xmpp, not ICQ.
I don’t think it’s unreasonable to think that people will eventually
leave twitter for a status.net implementation, or github for something
else.

best, Erik
Sent from my free software system http://fsf.org/.


pgpxpukkELTRd.pgp
Description: PGP signature


Re: [CODE4LIB] thanks and poetry

2013-02-19 Thread Erik Hetzner
At Sat, 16 Feb 2013 06:42:04 -0800,
Karen Coyle wrote:
 
 gitHub may have excellent startup documentation, but that startup 
 documentation describes git in programming terms mainly using *nx 
 commands. If you have never had to use a version control system (e.g. if 
 you do not write code, especially in a shared environment), clone 
 push pull are very poorly described. The documentation is all in 
 terms of *nx commands. Honestly, anything where this is in the 
 documentation:
 
 On Windows systems, Git looks for the |.gitconfig| file in the |$HOME| 
 directory (|%USERPROFILE%| in Windows’ environment), which is 
 |C:\Documents and Settings\$USER| or |C:\Users\$USER| for most people, 
 depending on version (|$USER| is |%USERNAME%| in Windows’ environment).
 
 is not going to work for anyone who doesn't work in Windows at the 
 command line.
 
 No, git is NOT for non-coders.

For what it’s worth, this programmer finds git’s interface pretty
terrible. I prefer mercurial (hg), but I don’t know if it’s any better
for people who aren’t familar with a command line.

  http://mercurial.selenic.com/guide/

(As a general rule, for every programmer who prefers tool A, and says
that everybody should use it, there’s a programmer who disparages tool
A, and advocates tool B. So take what we say with a grain of salt!)

(And as a further aside, there’s plenty to dislike about github as
well, from it’s person-centric view of projects (rather than
team-centric) to its unfortunate centralizing of so much free/open
source software on one platform.)

best, Erik
Sent from my free software system http://fsf.org/.


pgpKhLEacXDgb.pgp
Description: PGP signature


[CODE4LIB] code4lib 2013 location

2013-01-11 Thread Erik Hetzner
Hi all,

Apparently code4lib 2013 is going to be held at the UIC Forum

  http://www.uic.edu/depts/uicforum/

I assumed it would be at the conference hotel. This is just a note so
that others do not make the same assumption, since nowhere in the
information about the conference is the location made clear.

Since the conference hotel is 1 mile from the venue, I assume
transportation will be available.

best, Erik Hetzner
Sent from my free software system http://fsf.org/.


pgpnr9TtfSgBA.pgp
Description: PGP signature


Re: [CODE4LIB] Open source project questions

2012-12-07 Thread Erik Hetzner
At Fri, 7 Dec 2012 14:58:11 -0500,
Donna Campbell wrote:
 
 Dear Colleagues,
 
 I understand from a professional colleague, who referred me to this list,
 that there are some experienced open source programmers here. I am in the
 early stages of planning for a conference session/open source project in
 June 2013 for a different professional library organization. Here is the
 session title and description:

 […]

Hi Donna,

For understanding free/open source software development processes, you
can’t beat Karl Fogel’s book, Producing open source software,
available online: http://producingoss.com/

best, Erik
Sent from my free software system http://fsf.org/.


Re: [CODE4LIB] anti-harassment policy for code4lib?

2012-11-30 Thread Erik Hetzner
At Fri, 30 Nov 2012 11:34:41 +,
MJ Ray wrote:
 
 Esmé Cowles escow...@ucsd.edu
  Also, I've seen a number of reports over the last few years of women
  who were harassed at predominately-male tech conferences.  Taken
  together, they paint a picture of men (particularly drunken men)
  creating an atmosphere that makes a lot of people feel excluded and
  worry about being harassed or worse.  So I think a positive
  statement of values, and the general raising of consciousness of
  these issues, is a good thing.
 
 I'm a member of software.coop, which helps write library software,
 including Koha - we co-hosted KohaCon12 this summer.  Like all co-ops,
 our core values include equality.  I would like to see an
 anti-harassment policy for code4lib.
 
 However, I'm saddened that I seem to be the first to object to the
 hand-waving (number of reports) and prejudice in the above
 paragraph.  The above problems seem more likely to arise from being
 drunk or being idiots than from being men. […]

Hi MJ,

Starting from this incorrect position will lead to the wrong
harassment guidelines being drawn up. Obviously the goal is equal
respect, but you don’t get there by pretending that the root problem
is drunkenness, or that men and women treat one another with
disrespect in equal amounts. It’s not hand-waving to say that sexual
harassment happens, and that (with negligible exceptions) it is is men
who are the perpetrators. To pretend otherwise will not produce an
effective anti-harassment policy.

best, Erik
Sent from my free software system http://fsf.org/.


Re: [CODE4LIB] Any libraries have their sites hosted on Amazon EC2?

2012-02-22 Thread Erik Hetzner
At Wed, 22 Feb 2012 23:34:14 +0100,
Thomas Krichel wrote:

   Roy Tennant writes

  I'd also be interested in getting some real world cost information. I
  installed an app on EC2 that went mostly unused for a couple months but
  meanwhile racked up over $300 in charges. Color me surprised.

   I am not impressed by Amazon either.  I have an instance given to me
   by a sponsor, and there I have been taken aback by the old Debian
   kernel version this puts me in.

   I rent three root servers with Hetzner.de. That's for large-scale work.
   To run a blog, a 3TB disk 16 Gig ram box from Hetzner is overkill.
   With Hetzner you have the exchange rate risk but the cost structure
   is much simpler.

Another satisfied customer.

best, Erik Hetzner

PS: But seriously, no relation.
Sent from my free software system http://fsf.org/.


pgpZvpL5tGJVN.pgp
Description: PGP signature


Re: [CODE4LIB] Linux Laptop

2011-12-14 Thread Erik Hetzner
At Wed, 14 Dec 2011 09:54:09 -0800,
Chris Fitzpatrick wrote:
 
 Thanks everyone for all the recommendations. I know this would be this list 
 to ask. 
 
 Sounds like Ubuntu is the overwhelming favorite. In the past when
 I've used a linux in a non-server computer, there are always some
 annoying problems... things like the laptop not waking from sleep
 mode, power consumption problems, or the microphone not working.
 
 So, I wondering about specific laptop brands/models and linux
 distributions/versions that people have found to work really well. A
 Dell or ThinkPad with Ubuntu seems to be the popular choice?
 
 But, yeah, I know i started it, but I'm going to avoid going deeper
 into my opinions on Apple vs. Windows vs. Linux and the implications
 vis-à-vis productivity, copyright, social justice, and the plight of
 the polar bear. If only out of concern that introducing this
 discussion might cause the poor mail server at ND to meltdown…..

For what its worth, I run Ubuntu happily on my old (2007) macbook. The
only real tricky part is the lack of 2 pointer buttons. So you don’t
need to get rid of the mac to switch off OS X.

That said, I would not buy a mac again, if only because Apple has gone
into full-bore evil mode.

Finally, in my biased experience, a system running Ubuntu is now more
usable than a system running OS X. This is my experience, and I am not
going to argue about it. :) I imagine it works even better if you buy
a system that is certified or pre-installed. And if you are interested
in a netbook, although Ubuntu has discontinued the Ubuntu Netbook
Edition, I think the Unity interface is pretty slick on a small
netbook screen.

best, Erik
Sent from my free software system http://fsf.org/.


pgpPAb27CWp55.pgp
Description: PGP signature


Re: [CODE4LIB] Patents and open source projects

2011-12-05 Thread Erik Hetzner
At Mon, 5 Dec 2011 08:17:26 -0500,
Emily Lynema wrote:
 
 A colleague approached me this morning with an interesting question that I
 realized I didn't know how to answer. How are open source projects in the
 library community dancing around technologies that may have been patented
 by vendors? We were particularly wondering about this in light of open
 source ILS projects, like Kuali OLE, Koha, and Evergreen. I know OLE is
 still in the early stages, but did the folks who created Koha and Evergreen
 ever run into any problems in this area? Have library vendors historically
 pursued patents for their systems and solutions?

I don’t think libraries have a particularly unique perspective on
this: most free/open source software projects have the same issues
with patents.

The Software Freedom Law Center has some basic information about these
issues. As I recall, the “Legal basics for developers” edition of
their podcasts is useful [1], but other editions may be helpful as
well.

Basically, the standard advice for patents is what Mike Taylor gave:
ignore them. Pay attention to copyright and trademark issues (as the
Koha problem shows), but patents really don’t need to be on your
radar.

best, Erik

1. 
http://www.softwarefreedom.org/podcast/2011/aug/16/Episode-0x16-Legal-Basics-for-Developers/
Sent from my free software system http://fsf.org/.


pgpzvpOQEi9B4.pgp
Description: PGP signature


Re: [CODE4LIB] Web archiving and WARC

2011-11-23 Thread Erik Hetzner
At Wed, 23 Nov 2011 18:30:02 -0500,
Edward M. Corrado wrote:
 
 Hello All,
 
 I need to harvest a few Web sites in order to preserve them. I'd
 really like to preserve them using the WARC file format [1] since it
 is a standard for digital preservation. I looked at I looked at Web
 Curator Tool (WCT) and Heritrix and they seem to be good at what they
 do but are built to work on a much larger scale then what I'd like to
 do -- and that comes with a cost of increased complexity. Tools like
 wget are simple to use and can easily be scripted to accomplish my
 limited task, except the standard wget and similar tools I am familiar
 with do not support WARC. Also, I haven't been able to find a tool
 that can convert zipped files created with wget to WARC.
 
 I did find a version of wget with warc support built in [1] from the
 Archive Team so that may be my solution, but compile software with
 dirty written into the name of the zip file is maybe not the best
 longterm solution. Does anyone know of any other simples tool to
 create a WARC file (either from harvesting or converting a wget or
 similar mirror/archive)?

Hi Edward,

The WCT uses Heritrix behind the scenes. Basically Heritrix or
wget+warc are your only two solutions, unless you convert to WARC from
something else. And I have never seen another crawler that gathers the
information that needs to do into the WARC file.

Heritrix isn’t that bad to get up  running. The more tricky issue is
what to do with the WARC files once you have them.

best, Erik
Sent from my free software system http://fsf.org/.


pgpTLQvRfPRwF.pgp
Description: PGP signature


Re: [CODE4LIB] Plea for help from Horowhenua Library Trust to Koha Community

2011-11-22 Thread Erik Hetzner
At Tue, 22 Nov 2011 13:51:11 +1300,
Joann Ransom wrote:
 
 Horowhenua Library Trust is the birth place of Koha and the longest serving
 member of the Koha community. Back in 1999 when we were working on Koha,
 the idea that 12 years later we would be having to write an email like this
 never crossed our minds. It is with tremendous sadness that we must write
 this plea for help to you, the other members of the Koha community.

 […]

Hi Joann,

The Software Freedom Law Center (http://softwarefreedom.org) might be
able to help as well:

  The Software Freedom Law Center provides pro-bono legal services to
  developers of Free, Libre, and Open Source Software.

They list trademark defense as one of their services.

best, Erik
Sent from my free software system http://fsf.org/.


pgpWIG8zg2J7D.pgp
Description: PGP signature


Re: [CODE4LIB] internet explorer and pdf files

2011-08-29 Thread Erik Hetzner
At Mon, 29 Aug 2011 15:30:56 -0400,
Eric Lease Morgan wrote:

 I need some technical support when it comes to Internet Explorer (IE) and PDF 
 files.

 Here at Notre Dame we have deposited a number of PDF files in a Fedora 
 repository. Some of these PDF files are available at the following URLs:

   * 
 http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1
   * 
 http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832898/PDF1
   * 
 http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:999332/PDF1
   * 
 http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832657/PDF1
   * 
 http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1001919/PDF1
   * 
 http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832818/PDF1
   * 
 http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:834207/PDF1

 Retrieving the URLs with any browser other than IE works just fine.

 Unfortunately IE's behavior is weird. The first time someone tries
 to load one of these URL nothing happens. When someone tries to load
 another one, it loads just fine. When they re-try the first one, it
 loads. We are banging our heads against the wall here at Catholic
 Pamphlet Central. Networking issue? Port issue? IE PDF plug-in?
 Invalid HTTP headers? On-campus versus off-campus issue?

 Could some of y'all try to load some of the URLs with IE and tell me
 your experience? Other suggestions would be greatly appreciated as
 well.

Hi Eric,

As I recall IE fetches PDFs oddly sometimes. It will do a GET, then
interrupt it, the GET the favicon.ico, then resume the original GET
using a Range header to request the rest of the PDF. (This info is
copied from an email from May 2008, so it may be out of date). Your
server might not like that kind of abuse.

Wireshark can be your friend here.

Hope that helps!

best, Erik
Sent from my free software system http://fsf.org/.


pgpXz2Ut56zS1.pgp
Description: PGP signature


Re: [CODE4LIB] to link or not to link: PURLs

2011-01-26 Thread Erik Hetzner
At Wed, 26 Jan 2011 13:57:42 -0600,
Pottinger, Hardy J. wrote:
 
 Hi, this topic has come up for discussion with some of my
 colleagues, and I was hoping to get a few other perspectives. For a
 public interface to a repository and/or digital library, would you
 make the handle/PURL an active hyperlink, or just provide the URL in
 text form? And why?
 
 My feeling is, making the URL an active hyperlink implies confidence
 in the PURL/Handle, and provides the user with functionality they
 expect of a hyperlink (right or option-click to copy, or bookmark).

A permanent URL should be displayed in the address bar of the user’s
browser. Then, when users do what they are going to do anyway (select
the link in the address bar  copy it), it will work.

best, Erik Hetzner
Sent from my free software system http://fsf.org/.


pgp80PT94Qhgm.pgp
Description: PGP signature


Re: [CODE4LIB] to link or not to link: PURLs

2011-01-26 Thread Erik Hetzner
At Wed, 26 Jan 2011 17:01:05 -0500,
Jonathan Rochkind wrote:

 It's sometimes not feasible/possible though. But it is unfortunate, and
 I agree you should always just do that where possible.

 I wonder if Google's use of the link rel=canonical element has been
 catching on with any other tools? Will any browses, delicious
 extensions, etc., bookmark that, or offer the option to bookmark that,
 or anything, instead of the one in the address bar?

The W3C WWW Technical Architecture Group has some interest in making
302 found redirects work as they were supposed to in browsers [1], but
there is not a lot of movement there, as far as I know.

In the meantime I believe that we should strive in all cases to ensure
that the URL in the address bar is the permanent URL.

best, Erik

1. http://www.w3.org/QA/2010/04/why_does_the_address_bar_show.html
Sent from my free software system http://fsf.org/.


pgpTc0ywBEOv1.pgp
Description: PGP signature


Re: [CODE4LIB] Inlining HTTP Headers in URLs

2010-06-02 Thread Erik Hetzner
At Wed, 2 Jun 2010 15:23:05 -0400,
Jonathan Rochkind wrote:
 
 Erik Hetzner wrote:
 
  Accept-Encoding is a little strange. It is used for gzip or deflate
  compression, largely. I cannot imagine needing a link to a version
  that is gzipped.
 
  It is also hard to imagine why a link would want to specify the
  charset to be used, possibly overriding a client’s preference. If my
  browser says it can only supports UTF-8 or latin-1, it is probably
  telling the truth.

 Perhaps when the client/user-agent is not actually a web browser that 
 is simply going to display the document to the user, but is some kind of 
 other software. Imagine perhaps archiving software that, by policy, only 
 will take UTF-8 encoded documents, and you need to supply a URL which is 
 guaranteed to deliver such a thing.
 
 Sure, the hypothetical archiving software could/should(?)  just send an 
 actual HTTP header to make sure it gets a UTF-8 charset document.  But 
 maybe sometimes it makes sense to provide an identifier that actually 
 identifies/points to the UTF-8 charset version -- and that in the actual 
 in-practice real world is more guaranteed to return that UTF-8 charset 
 version from an HTTP request, without relying on content negotation 
 which is often mis-implemented. 
 
 We could probably come up with a similar reasonable-if-edge-case for 
 encoding.

 So I'm not thinking so much of over-riding the conneg -- I'm thinking 
 of your initial useful framework, one URI identifies a more abstract 
 'document', the other identifies a specific representation. And 
 sometimes it's probably useful to identify a specific representation in 
 a specific charset, or, more of a stretch, encoding. No?

I’m certainly not thinking it should never be done. Personally I would
leave it out of SRU without a serious use case, but that is obviously
not my decision. Still, in my capacity as nobody whatsoever, I would
advise against it. ;)
 
 I notice you didn't mention 'language', I assume we agree that one is 
 even less of a stretch, and has more clear use cases for including in a 
 URL, like content-type.

Definitely.

best, Erik
Sent from my free software system http://fsf.org/.


pgpQQ4F8ZxBbI.pgp
Description: PGP signature


Re: [CODE4LIB] Inlining HTTP Headers in URLs

2010-06-01 Thread Erik Hetzner
At Tue, 1 Jun 2010 14:21:56 -0400,
LeVan,Ralph wrote:
 
 I've been sensing a flaw in HTTP for some time now.  It seems like you
 ought to be able to do everything through a URL that you can using a
 complete interface to HTTP.  Specifically, I'd love to be able to
 specify values for HTTP headers in a URL.
 
 To plug that gap locally, I'm looking for a java servlet filter that
 will look for query parameters in a URL, recognize that some of them are
 HTTP Headers, strip the query parms and set those Headers in the request
 that my java servlet eventually gets.

 Does such a filter exist already?  I've looked and not been able to find
 anything.  It seems like the work of minutes to produce such a filter.
 I'll be happy to put it out as Open Source if there's any interest.

Hi -

I am having a hard time imagining the use case for this.

Why should you allow a link to determine things like the User-Agent
header? HTTP headers are set by the client for a reason.

Furthermore, as somebody involved in web archiving, I would like to
ask you not to do this.

It is already hard enough for us to tell that:

  http://example.org/HELLOWORLD

is usually the same as:

  http://www.example.org/HELLOWORLD

or:

  http://www.example.org/helloworld

I don’t want to work in a world where this might be the same as:

  http://192.0.32.10/helloworld?HTTP-Host=example.org

Apologies if this sounds hostile, and thanks for reading.

best, Erik Hetzner
Sent from my free software system http://fsf.org/.


pgpC9To4fBtJW.pgp
Description: PGP signature


Re: [CODE4LIB] Microsoft Zentity

2010-04-28 Thread Erik Hetzner
At Wed, 28 Apr 2010 15:11:39 +0100,
David Kane wrote:
 
 Andy,
 
 It is a highly extensible platform, based on .NET and windows.  It is also
 open source!
 […]

Here is the license:

  
http://research.microsoft.com/en-us/downloads/48e60ac1-a95a-4163-a23d-28a914007743/Research-Output%20Repository%20Platform%20EULA%20%282008-06-06%29.txt

This is not an open source license.

best,
Erik Hetzner

;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpVPPJvvYS4q.pgp
Description: PGP signature


Re: [CODE4LIB] Temporary redirection and the location bar

2010-03-01 Thread Erik Hetzner
At Fri, 26 Feb 2010 10:00:15 -0500,
Esme Cowles escow...@ucsd.edu wrote:

 One solution to this problem is to use a reverse proxy instead of a
 redirect. We do this for our ARKs, so temporary URL is not shown to
 the end user at all.

 This is not a general solution, especially for people who are
 redirecting externally and are concerned about the phishing scenario
 described in:

 http://www.w3.org/TR/2001/NOTE-cuap-20010206#cp-temp-redir

 I think the ideal solution would be to have the browser location bar
 show the original URL, with a conspicuous indication of redirection,
 which would provide access to the redirection chain and the final
 URL. Bookmarking would default to the original URL, but provide the
 option of using the final URL instead.

Hi Esme -

This is a great solution, as long as you control both sides. Thanks
for pointing it out.

Your solution for the browser is very close to one proposed at [1].
This bug is now 9 years old.

I believe that there is some reluctance among browser authors to
change the behavior at this point.

If others on this list are interested in persistent identifiers, and
you have some time, I think it would be worth your while to research
this issue. It might be useful in the future to demonstrate that there
are people who care about this issue.

best,
Erik Hetzner

1. https://bugzilla.mozilla.org/show_bug.cgi?id=68423#c11
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpUfnYxI1wpL.pgp
Description: PGP signature


[CODE4LIB] Temporary redirection and the location bar

2010-02-24 Thread Erik Hetzner
Hi -

This is an issue which is of great importance to persistent
identifiers on the web, and one which I thought should be brought to
the attention of the c4l community. It affects PURLs, ARKs, and in
general any system that redirects a persistent or permanent URI to
another, temporary URI. I did not, however, realize that there was
active debate about it.

Briefly, from [1]:

  3.4 Do not treat HTTP temporary redirects as permanent redirects.

  The HTTP/1.1 specification [RFC2616] specifies several types of
  redirects. The two most common are designated by the codes 301
  (permanent) and 302 or 307 (temporary):

  * A 301 redirect means that the resource has been moved permanently
and the original requested URI is out-of-date.

  * A 302 or 307 redirect, on the other hand, means that the resource
has a temporary URI, and the original URI is still expected to
work in the future. The user should be able to bookmark, copy, or
link to the original (persistent) URI or the result of a temporary
redirect.

  Wrong: User agents usually show the user (in the user interface) the
  URI that is the result of a temporary (302 or 307) redirect, as they
  would do for a permanent (301) redirect.

There is more info at [2]. You can find the email thread at [3].

best,
Erik Hetzner

1. http://www.w3.org/TR/2001/NOTE-cuap-20010206#cp-temp-redir
2. http://www.w3.org/2001/tag/group/track/issues/57
3. 
http://www.w3.org/mid/760bcb2a1002231400m5e9b2bb6rc80bb43c37a81...@mail.gmail.com

---BeginMessage---
http://www.w3.org/2001/tag/2010/02/redirects-and-address-bar.txt

Written in the style of a blog post, and making use of
state-of-the-art theory (such as it is) of http semantics.

If this gets review and approval of some kind (especially from TimBL,
who has been the vocal campaigner on this question) I'll htmlify and
post it and be done with this action. Not sure what else to do.

Jonathan

---End Message---


pgpMRUclrgHer.pgp
Description: PGP signature


Re: [CODE4LIB] Character problems with tictoc

2009-12-21 Thread Erik Hetzner
At Mon, 21 Dec 2009 14:59:01 -0500,
Glen Newton wrote:
 Thanks, Erik, some useful tools and advice.

Glad to help!

 […]

 But I don't understand why Firefox was ignoring the
  Content-Type: text/plain; charset=utf-8
 It should not be using the default charset (ISO-Latin 8859-1) for 
 this content, as it has been told the text encoding is UTF-8...

It seems to work fine in my version of Firefox (Mozilla/5.0 (X11; U;
Linux i686; en-US; rv:1.9.1.6) Gecko/20091215 Ubuntu/9.10 (karmic)
Firefox/3.5.6), with latin-1 default.

best,
Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpQvfQeD04GX.pgp
Description: PGP signature


Re: [CODE4LIB] web archiving - was: Implementing OpenURL for simple web resources

2009-09-29 Thread Erik Hetzner
At Fri, 18 Sep 2009 10:40:08 -0400,
Ed Summers wrote:
 
 Hi Erik, all

 […]

 I haven't been following this thread completely, but you've taken it
 in an interesting direction. I think you've succinctly described the
 issue with using URLs as references in an academic context: that the
 integrity of the URL is a function of time. As John Kunze has said:
 Just because the URI was the last to see a resource alive doesn't
 mean it killed them :-)
 
 I'm sure you've seen this, but Internet Archive have a nice URL
 pattern for referencing a resource representation in time:
 
   http://web.archive.org/web/{year}{month}{day}{hour}{minute}{seconds}/{url}
 
 So for example you can reference Google's homepage on December 2, 1998
 at 23:04:10 with this URL:
 
   http://web.archive.org/web/19981202230410/http://www.google.com/
 
 As Mike's email points out this is only good as long as Internet
 Archive is up and running the way we expect it to. Having any one
 organization shoulder this burden isn't particularly scalable, or
 realistic IMHO. But luckily the open and distributed nature of the
 web allows other organizations to do the same thing--like the great
 work you all are doing at the California Digital Library [1] and
 similar efforts like WebCite [2]. It would be kinda nice if these
 web archiving solutions sported similar URI patterns to enable
 discovery. For example it looks like:
 
   
 http://webarchives.cdlib.org/sw1jd4pq4k/http://books.nap.edu/html/id_questions/appB.html
 
 references a frame that surrounds an actual representation in time:
 
   
 http://webarchives.cdlib.org/wayback.public/NYUL_ag_3/20090320202246/http://books.nap.edu/html/id_questions/appB.html
 
 Which is quite similar to Internet Archive's URI pattern -- not
 surprising given the common use of Wayback [3]. But there are some
 differences. It might be nice to promote some URI patterns for web
 archiving services, so that we could theoretically create
 applications that federated search for a known resource at a given
 time. I guess in part OpenURL was designed to fill this space, but
 it might instead be a bit more natural to define a URI pattern that
 approximated what Wayback does, and come up with some way of sharing
 archive locations. I'm not sure if that last bit made any sense, or
 if some attempt at this has been made already. Maybe something to
 talk about at iPRES?
 
 I had hoped that the Zotero/InternetArchive collaboration would lead
 to some more integration between scholarly use of the web and
 archiving [3]. I guess there's still time?
 
 //Ed
 
 [1] http://webarchives.cdlib.org/
 [2] http://www.webcitation.org/
 [3] http://inkdroid.org/journal/2007/12/17/permalinks-reloaded/

Hi Ed, code4libbers -

Sorry for the late reply, but I have been on vacation.

Thanks for the insightful comments. They are very much in line with
things I have been thinking and you have got me thinking along some
other lines as well.

Our system is based on crawls, so in your example sw1jd4pq4k is a
crawl id. We discussed using the .../20090101.../http://.. scheme
directly as in wayback, but decided to use crawl-based URLs as our
primary mechanism of entry, given the constraints of our system.

(By the way, the ...wayback.public... URL should not be relied on
for permanence!)

We would, however, like to support the use of wayback style URLs as
well. There is some interest in the web archiving community of
increasing interoperability between web archive systems, so that we
can, for instance, direct a user to web.archive.org if we do not have
a URL in our system, and vice versa.

In terms of getting authors to cite archived material rather than live
web material, there are many approaches to this that I can think of,
for example:

a) Encouraging authors to link to archive.org or other web archives
rather than the live web;

b) Creating services to allow authors to take snapshots of websites,
like webcite, if necessary;

c) Rewriting links in our system to point to archives, so that, for
instance, the reference (taken from first google search for “mla
website citation”, and, of course, broken):

Lynch, Tim. DSN Trials and Tribble-ations Review. Psi Phi: Bradley's
Science Fiction Club. 1996. Bradley University. 8 Oct. 1997
http://www.bradley.edu/campusorg/psiphi/DS9/ep/503r.html.

would be rewritten to the working URL, based on the URL provided and
the access time (8 Oct. 1997):

http://web.archive.org/1997100800/http://www.bradley.edu/campusorg/psiphi/DS9/ep/503r.html

d) Publicizing web archiving so that uses know that they can use tools
like the web archive to find those broken links.

e) Providing browser plugins so that users who follow 404ed links can
be given the alternative of proceeding to an archived web site.

best,
Erik Hetzner
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpKgGuCp4dKB.pgp
Description: PGP signature


Re: [CODE4LIB] Implementing OpenURL for simple web resources

2009-09-16 Thread Erik Hetzner
At Wed, 16 Sep 2009 13:39:42 +0100,
O.Stephens wrote:

 Thanks Erik,

 Yes - generally references to web sites require a 'route of access'
 (i.e. URL) and 'date accessed' - because, of course, the content of
 the website may change over time.

 Strictly you are right - if you are going to link to the resource it
 should be to the version of the page that was available at the time
 the author accessed it. This time aspect is something I'm thinking
 about more as a result of the conversations on this thread. The
 'date accessed' seems like a good way of differentiating different
 possible resolutions of a single URL. Unfortunately references don't
 have a specified format for date, and they can be expressed in a
 variety of ways - typically you'll see something like 'Accessed 14
 September 2009', but as far as I know it could be 'Accessed
 14/09/09' or I guess 'Accessed 09/14/09' etc.

 It is also true that the intent of a reference can vary - sometimes
 the intent is to point at a website, and sometimes to point to the
 content of a website at a moment in time (thinking loosely in FRBR
 terms I guess you'd say that sometimes you want to reference the
 work/expression, and sometimes the manifestation? - although I know
 FRBR gets complicated when you look at digital representations, a
 whole other discussion)

 To be honest, our project is not going to delve into this too much -
 limited both by time (we finish in February) and practicalities (I
 just don't think the library/institution is going to want to look at
 snapshotting websites, or finding archived versions for each course
 we run - I suspect it would be less effort to update the course to
 use a more current reference in the cases this problem really
 manifests itself).

 One of the other things I've come to realise is that although it is
 nice to be able to access material that is referenced, the reference
 primarily recognises the work of others, and puts your work into
 context - access is only a secondary concern. It is perfectly
 possible and OK to reference material that is not generally
 available, as a reader I may not have access to certain material,
 and over time material is destroyed so when referencing rare or
 unique texts it may become absolutely impossible to access the
 referenced source.

 I think for research publications there is a genuine and growing
 issue - especially when we start to consider the practice of
 referencing datasets which is just starting to become common
 practice in scientific research. If the dataset grows over time,
 will it be possible to see the version of the dataset used when
 doing a specific piece of research?

You might find the WebCite service [1] to be of some use. Of course it
cannot work retroactively, so it is best if researchers use it
in the first place.

best,
Erik Hetzner

1. http://www.webcitation.org/
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpbj5M2lZ56Y.pgp
Description: PGP signature


Re: [CODE4LIB] Implementing OpenURL for simple web resources

2009-09-15 Thread Erik Hetzner
Hi Owen, all:

This is a very interesting problem.

At Tue, 15 Sep 2009 10:04:09 +0100,
O.Stephens wrote:
 […]

 If we look at a website it is pretty difficult to reference it
 without including the URL - it seems to be the only good way of
 describing what you are actually talking about (how many people
 think of websites by 'title', 'author' and 'publisher'?). For me,
 this leads to an immediate confusion between the description of the
 resource and the route of access to it. So, to differentiate I'm
 starting to think of the http URI in a reference like this as a URI,
 but not necessarily a URL. We then need some mechanism to check,
 given a URI, what is the URL.

 […]

 The problem with the approach (as Nate and Eric mention) is that any
 approach that relies on the URI as a identifier (whether using
 OpenURL or a script) is going to have problems as the same URI could
 be used to identify different resources over time. I think Eric's
 suggestion of using additional information to help differentiate is
 worth looking at, but I suspect that this is going to cause us
 problems - although I'd say that it is likely to cause us much less
 work than the alternative, which is allocating every single
 reference to a web resource used in our course material it's own
 persistent URL.

 […]

I might be misunderstanding you, but, I think that you are leaving out
the implicit dimension of time here - when was the URL referenced?
What can we use to represent the tuple URL, date, and how do we
retrieve an appropriate representation of this tuple? Is the most
appropriate representation the most recent version of the page,
wherever it may have moved? Or is the most appropriate representation
the page as it existed in the past? I would argue that the most
appropriate representation would be the page as it existed in the
past, not what the page looks like now - but I am biased, because I
work in web archiving.

Unfortunately this is a problem that has not been very well addressed
by the web architecture people, or the web archiving people. The web
architecture people start from the assumption that
http://example.org/ is the same resource which only varies in its
representation as a function of time, not in its identity as a
resource. The web archives people create closed systems and do not
think about how to store and resolve the tuple, URL, date.

I know this doesn’t help with your immediate problem, but I think
these are important issues.

best,
Erik Hetzner
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpoU4UofTFjn.pgp
Description: PGP signature


Re: [CODE4LIB] Implementing OpenURL for simple web resources

2009-09-14 Thread Erik Hetzner
At Mon, 14 Sep 2009 14:48:23 +0100,
O.Stephens wrote:
 
 I'm working on a project called TELSTAR (based at the Open
 University in the UK) which is looking at the integration of
 resources into an online learning environment (see
 http://www.open.ac.uk/telstar for the basic project details). The
 project focuses on the use of References/Citations as the way in
 which resources are integrated into the teaching
 material/environment.
 
 We are going to use OpenURL to provide links (where appropriate)
 from references to full text resources. Clearly for journals,
 articles, and a number of other formats this is a relatively well
 understood practice, and implementing this should be relatively
 straightforward.
 
 However, we also want to use OpenURL even where the reference is to
 a more straightforward web resource - e.g. a web page such as
 http://www.bbc.co.uk. This is in order to ensure that links provided
 in the course material are persistent over time. A brief description
 of what we perceive to be the problem and the way we are tackling it
 is available on the project blog at
 http://www.open.ac.uk/blogs/telstar/2009/09/14/managing-link-persistence-with-openurls/
 (any comments welcome).
 
 What we are considering is the best way to represent a web page (or
 similar - pdf etc.) in an OpenURL. It looks like we could do
 something as simple as:
 
 http://resolver.address/?
 url_ver=Z39.88-2004
 url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx
 rft_id=http%3A%2F%2Fwww.bbc.co.uk
 
 Is this sufficient (and correct)? Should we consider passing fuller
 metadata? If the latter should we use the existing KEV DC
 representation, or should we be looking at defining a new metadata
 format? Any help would be very welcome.

Here are some things that I would take into consideration, not related
to the technical OpenURL question, but I think relevant anyhow.

a) What will people do if the service that you provide goes away? A
good thing about the OpenURL that you have above is that even if your
resolver no longer works, a savvy user can see that the OpenURL is
supposed to point at http://www.bbc.co.uk/. A bad thing about the old
URL that you have on your blog:

http://routes.open.ac.uk/ixbin/hixclient.exe?_IXDB_=routes_IXSPFX_=gsubmit-button=summary$+with+res_id+is+res9377

is that when that URL stops working -  I will bet money it will stop
working before www.bcc.co.uk stops working - nobody will know what it
meant.

b) How can you ensure that your service will not go away? What is the
institutional commitment? If you can’t provide a stronger commitment
than, e.g., www.bbc.co.uk, is this worth doing?

c) Who will maintain that database that redirects www.bbc.co.uk to
www.neobbc.co.uk? (see second part of B above).

d) Is there a simpler solution to this problem than OpenURL?

e) Finally: how many problems will this solve? It seems to me that
this is only useful in the case of URL A1 moving to A2 (e.g.,
following an organization rename) where the organization does not
maintain a redirect. In other words, it is not particularly useful in
cases where URL A1 goes away completely (in which case there is no
unarchived URL to go to) and where a redirect is maintained from A1 to
A2 (in which case there is no need to maintain your own redirect). How
many instances of this are there? Maybe there are many; www.bbc.co.uk
is a bad example, but a journal article online might move around a
lot.

Hope that is useful! Thanks for reading.

best,
Erik Hetzner


pgphclJkUh9ue.pgp
Description: PGP signature


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Erik Hetzner
At Fri, 1 May 2009 09:51:19 -0500,
Amanda P wrote:
 
 On the other hand, there are projects like bkrpr [2] and [3],
 home-brew scanning stations build for marginally more than the cost of
 a pair of $100 cameras.
 
 Cameras around $100 dollars are very low quality. You could get no where
 near the dpi recommended for materials that need to be OCRed. The quality of
 images from cameras would be not only low, but the OCR (even with the best
 software) would probably have many errors. For someone scanning items at
 home this might be ok, but for archival quality, I would not recommend
 cameras. If you are grant funded and the grant provider requires a certain
 level of quality, you need to make sure the scanning mechanism you use can
 scan at that quality.

I know very little about digital cameras, so I hope I get this right.

According to Wikipedia, Google uses (or used) an 11MP camera (Elphel
323). You can get a 12MP camera for about $200.

With a 12MP camera you should easily be able to get 300 DPI images of
book pages and letter size archival documents. For a $100 camera you
can get more or less 300 DPI images of book pages. *

The problems I have always seen with OCR had much to do with alignment
and artifacts than with DPI. 300 DPI is fine for OCR as far as my
(limited) experience goes - as long as you have quality images.

If your intention is to scan items for preservation, then, yes, you
want higher quality - but I can’t imagine any setup for archival
quality costing anywhere near $1000. If you just want to make scans 
full text OCR available, these setups seem worth looking at -
especially if the software  workflow can be improved.

best,
Erik

* 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of
a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing
pixels / 300). As long as you can get the camera close enough to the
image to not waste much space you will be getting in the close to 300
DPI range for images of size 8.5 x 11 or less. 
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgplxGVqVq0Xx.pgp
Description: PGP signature


Re: [CODE4LIB] Recommend book scanner?

2009-04-30 Thread Erik Hetzner
At Wed, 29 Apr 2009 13:32:08 -0400,
Christine Schwartz wrote:
 
 We are looking into buying a book scanner which we'll probably use for
 archival papers as well--probably something in the $1,000.00 range.
 
 Any advice?

Most organizations, or at least the big ones, Internet Archive and
Google, seem to be using a design based on 2 fixed cameras rather than
a tradition scanner type device. Is this what you had in mind?

Unfortunately none of these products are cheap. Internet Archive’s
Scribe machine cost upwards (3 years ago) of $15k, [1] mostly because
it has two very expensive cameras. Google’s data is unavailable. A
company called Kirtas also sells what look like very expensive
machines of a similar design.

On the other hand, there are projects like bkrpr [2] and [3],
home-brew scanning stations build for marginally more than the cost of
a pair of $100 cameras. I think that these are a real possibility for
smaller organizations. The maturity of the software and workflow is
problematic, but with Google’s Ocropus OCR software [4] freely
available as the heart of a scanning workflow, the possibility is
there. Both bkrpr and [3] have software currently available, although
in the case of bkrpr at least the software is in the very early stages
of development.

best,
Erik Hetzner

1. 
http://redjar.org/jared/blog/archives/2006/02/10/more-details-on-open-archives-scribe-book-scanner-project/
2. http://bkrpr.org/doku.php
3. 
http://www.instructables.com/id/DIY-High-Speed-Book-Scanner-from-Trash-and-Cheap-C/
4. http://code.google.com/p/ocropus/
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpYI2WLVtxUI.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-04-02 Thread Erik Hetzner
At Thu, 2 Apr 2009 13:47:50 +0100,
Mike Taylor wrote:
 
 Erik Hetzner writes:
   Without external knowledge that info:doi/10./xxx is a URI, I can
   only guess.
 
 Yes, that is true.  The point is that by specifying that the rft_id
 has to be a URI, you can then use other kinds of URI without needing
 to broaden the specification.  So:
   info:doi/10./j.1475-4983.2007.00728.x
   urn:isbn:1234567890
   ftp://ftp.indexdata.com/pub/yaz
 
 [Yes, I am throwing in an ftp: URL as an identifier just because I can
 -- please let's not get sidetracked by this very bad idea :-) ]

 This is not just hypothetical: the flexibility is useful and the
 ecapsulation of the choice within a URI is helpful. I maintain an
 OpenURL resolver that handles rft_id's by invoking a plugin
 depending on what the URI scheme is; for some URI schemes, such as
 info:, that then invokes another, lower-level plugin based on the
 type (e.g. doi in the example above). Such code is straightforward
 to write, simple to understand, easy to maintain, and nice to extend
 since all you have to do is provide one more encapsulated plugin.

Thanks for the clarification. Honestly I was also responding to Rob
Sanderson’s message (bad practice, surely) where he described URIs as
‘self-describing’, which seemed to me unclear. URIs are only
self-describing insofar as they describe what type of URI they are.

I think that all of us in this discussion like URIs. I can’t speak
for, say, Andrew, but, tentatively, I think that I prefer
info:doi/10./xxx to plain 10.111/xxx. I would just prefer
http://dx.doi.org/10./xxx

   (Caveat: I have no idea what rft_id, etc, means, so maybe that
   changes the meaning of what you are saying from how I read it.)
 
 No, it's doesn't :-)  rft_id is the name of the parameter used in
 OpenURL 1.0 to denote a referent ID, which is the same thing I've been
 calling a Thing Identifier elsewhere in this thread.  The point with
 this part of OpenURL is precisely that you can just shove any
 identifier at the resolver and leave it to do the best job it can.
 Your only responsibility is to ensure that the identifier you give it
 is in the form of a URI, so the resolver can use simple rules to pick
 it apart and decide what to do.

Thanks.

best,
Erik Hetzner


pgprSzdg7GAkN.pgp
Description: PGP signature


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-02 Thread Erik Hetzner
Hi Ray -

At Thu, 2 Apr 2009 13:48:19 -0400,
Ray Denenberg, Library of Congress wrote:
 
 You're right, if there were a web:  URI scheme, the world would be a 
 better place.   But it's not, and the world is worse off for it.

Well, the original concept of the ‘web’ was, as I understand it, to
bring together all the existing protocols (gopher, ftp, etc.), with
the new one in addition (HTTP), with one unifying address scheme, so
that you could have this ‘web browser’ that you could use for
everything. So web: would have been nice, but probably wouldn’t have
been accepted.

As it turns out, HTTP won overwhelmingly, and the older protocols died
off.

 It shouldn't surprise anyone that I am sympathetic to Karen's
 criticisms. Here is some of my historical perspective (which may
 well differ from others').
 
 Back in the old days, URIs (or URLs) were protocol based. The ftp
 scheme was for retrieving documents via ftp. The telnet scheme was
 for telnet. And so on. Some of you may remember the ZIG (Z39.50
 Implementors Group) back when we developed the z39.50 URI scheme,
 which was around 1995. Most of us were not wise to the ways of the
 web that long ago, but we were told, by those who were, that
 z39.50r: and z39.50s: at the beginning of a URL are explicit
 indications that the URI is to be resolved by Z39.50.
 
 A few years later the semantic web was conceived and alot of SW
 people began coining all manner of http URIs that had nothing to do
 with the http protocol. By the time the rest of the world noticed,
 there were so many that it was too late to turn back. So instead,
 history was altered. The company line became we never told you that
 the URI scheme was tied to a protocol.
 
 Instead, they should have bit the bullet and coined a new scheme.  They 
 didn't, and that's why we're in the mess we're in.

Not knowing the details of the history, your account seems correct to
me, except that I don’t think the web people tried to alter history.

I think of the web of having been a learning experience for all of us.
Yes, we used to think that the URI was tied to the protocol. But we
have learned that it doesn’t need to be, that HTTP URIs can be just
identifiers which happen to be dereferencable at the moment using the
HTTP protocol.

And it became useful to begin identifying lots of things, people and
places and so on, using identifiers, and it also seemed useful to use
a protocol that existed (HTTP), instead of coming up with the
Person-Metadata Transfer Protocol and inventing a new URI scheme
(pmtp://...) to resolve metadata about persons. Because HTTP doesn’t
care what kind of data it is sending down the line; it can happily
send metadata about people.

But that is how things grow; the http:// at the beginning of a URI may
eventually be a spandrel, when HTTP is dead and buried. And people
will wonder why the address http://dx.doi.org/10./xxx has those
funny characters in front of it. And doi.org will be long gone,
because they ran out of money, and their domain was taken over by
squatters, so we all had to agree to alter our browsers to include an
override to not use DNS to resolve the dx.doi.org domain but instead
point to a new, distributed system of DOI resolution.

We will need to fix these problems as they arise.

In my opinion, if we are interested in identifier persistent, clarity
about the difference between things and information about things,
creating a more useful web (of data), and the other things we ought to
be interested in, our time is best spent worrying about these things,
and how they can be built on top of the web. Our time is not well
spent in coming up with new ways to do things that web already does
for us.

For instance: if there is concern that HTTP URIs are not seen as being
persistent, it would be useful to try to add a method to HTTP which
indicated the persistence of an identifier. This way browsers could
display a little icon that indicated that the URI was persistent. A
user could click on this icon and get information about the
institution which claimed persistence for the URI, what the level of
support was, what other institution could back up that claim, etc.

Our time would not be well spent coming up with an elaborate scheme
for phttp:// URIs, creating a better DNS, with name control by a
better institution, and a better HTTP, with metadata, and a better
caching system, and so on. This is a lot of work and you forget what
you were trying to do in the first place, which is make HTTP URIs
persistent.

best,
Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpOEgu0KFRiA.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-04-02 Thread Erik Hetzner
At Thu, 2 Apr 2009 19:29:49 +0100,
Rob Sanderson wrote:
 All I meant by that was that the info:doi/ URI is more informative as to
 what the identifier actually is than just the doi by itself, which could
 be any string.  Equally, if I saw an SRW info URI like:
 
 info:srw/cql-context-set/2/relevance-1.0
 
 that's more informative than some ad-hoc URI for the same thing.
 Without the external knowledge that info:doi/xxx is a DOI and
 info:srw/cql-context-set/2/ is a cql context set administered by the
 owner with identifier '2' (which happens to be me), then they're still
 just opaque strings.

Yes, info:doi/10./xxx is more easily recognizable (‘sniffable’) as
a DOI than 10./xxx, both for humans and machines.

If we don’t know, by some external means, that a given string has the
form of some identifier, then we must guess, or sniff it.

But it is good practice to use other means to ensure that we know
whether or not any given string is an identifier, and if it is, what
type it is. Otherwise we can get confused by strings like go:home. Was
that a URI or not?

That said, I see no reason why the URI:

info:srw/cql-context-set/2/relevance-1.0

is more informative than the URI:

http://srw.org/cql-context-set/2/relevance-1.0

As you say, both are just opaque URIs without the additional
information. This information is provided by, in the first case, the
info-uri registry people, or, in the second case, by the organization
that owns srw.org.

 I could have said that http://srw.cheshire3.org/contextSets/rel/ was the
 identifier for it (SRU doesn't care) but that's the location for the
 retrieval documentation for the context set, not a collection of
 abstract access points.
 
 If srw.cheshire3.org was to go away, then people can still happily use
 the info URI with the continued knowledge that it shouldn't resolve to
 anything.

If srw.cheshire3.org goes away, people can still happily use the http
URI. (see below)

 With the potential dissolution of DLF, this has real implications, as
 DLF have an info URI namespace.  If they'd registered a bunch of URIs
 with diglib.org instead, which will go away, then people would have
 trouble using them.  Notably when someone else grabs the domain and
 starts using the URIs for something else.

The original URIs are still just as useful as identifiers, they have
become less useful as dereferenceable identifiers.

 Now if DLF were to disband AND reform, then they can happily go back to
 using info:dlf/ URIs even if they have a brand new domain.

The info:dlf/ URIs would be the same non-dereferenceable URIs they
always were, true. But what have we gained?

The issue of persistence of dereferenceablity is a real one. There are
solutions, e.g, other organizations can step in to host the domain;
the ARK scheme; or, we can all agree that the diglib.org domain is too
important to let be squatted, and agree that URIs that begin
http://diglib.org/ are special, and should by-pass DNS. [1]

  I think that all of us in this discussion like URIs. I can’t speak
  for, say, Andrew, but, tentatively, I think that I prefer
  info:doi/10./xxx to plain 10.111/xxx. I would just prefer
  http://dx.doi.org/10./xxx
 
 info URIs, In My Opinion, are ideally suited for long term
 identifiers of non information resources. But http URIs are
 definitely better than something which isn't a URI at all.

Something we can all agree on! URIs are better than no URIs.

best,
Erik

1. Take with a grain of salt, as this is not something I have fully
thought out the implications of.
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgp4pFCxNEtYW.pgp
Description: PGP signature


Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Erik Hetzner
At Thu, 2 Apr 2009 11:34:12 -0400,
Jonathan Rochkind wrote:
 […]

 I think too much of this conversation is about people's ideal vision of 
 how things _could_ work, rather than trying to make things work as best 
 as we can in the _actual world we live in_, _as well as_ planning for 
 the future when hopefully things will work even better.  You need a 
 balance between the two.

This is a good point. But as I see it, the web people - for lack of a
better word - *are* discussing the world we live in. It is those who
want to re-invent better ways of doing things who are not.

HTTP is here. HTTP works. *Everything* (save one) people want to do
with info: URIs or urn: URIs or whatever already works with HTTP.

I can count one thing that info URIs possess that HTTP URIs don’t: the
‘feature’ of not ever being dereferenceable. And even that is up in
the air - somebody could devise a method to dereference them at any
time. And then where are you?

 […]

 a) Are as likely to keep working indefinitely, in the real world of
 organizations with varying levels of understanding, resources, and
 missions.

Could somebody explain to me the way in which this identifier:

http://suphoa5d.org/phae4ohg

does not work *as an identifier*, absent any way of getting
information about the referent, in a way that:

info:doi/10.10.1126/science.298.5598.1569

does work?

I don’t mean to be argumentative - I really want to know! I think
there may be something that I am missing here.

 b) Are as likely as possible to be adopted by as many people as possible 
 for inter-operability. Having an ever-increasing number of possible 
 different URIs to represent the same thing is something to be avoided if 
 possible.

+1

 c) Are as useful as possible for the linked data vision.

+1

 […]

best,
Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpTIK69UTZMm.pgp
Description: PGP signature


Re: [CODE4LIB] points of failure (was Re: [CODE4LIB] resolution and identification )

2009-04-02 Thread Erik Hetzner
Erik Hetzner writes:
 Could somebody explain to me the way in which this identifier:
 
 http://suphoa5d.org/phae4ohg
 
 does not work *as an identifier*, absent any way of getting
 information about the referent, in a way that:
 
 info:doi/10.10.1126/science.298.5598.1569
 
 does work?

A quick clarification - before I digest Mike’s thoughts - I didn’t
mean to make a meaningless HTTP URI but a meaningful info URI.

What I was trying to illustrate was a non-dereferenceable URI. So,
for:

http://suphoa5d.org/phae4ohg

please read instead:

http://defunctdois.org/10.10.1126/science.298.5598.1569

Thanks!

best, Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpwlL93ehevk.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-04-01 Thread Erik Hetzner
At Wed, 1 Apr 2009 14:34:45 +0100,
Mike Taylor wrote:
 Not quite.  Embedding a DOI in an info URI (or a URN) means that the
 identifier describes its own type.  If you just get the naked string
   10./j.1475-4983.2007.00728.x
 passed to you, say as an rft_id in an OpenURL, then you can't tell
 (except by guessing) whether it's a DOI, a SICI, and ISBN or a
 biological species identifier.  But if you get
   info:doi/10./j.1475-4983.2007.00728.x
 then you know what you've got, and can act on it accordingly.

It seems to me that you are just pushing out by one more level the
mechanism to be able to tell what something is.

That is - before you needed to know that 10./xxx was a DOI. Now
you need to know that info:doi/10./xxx is a URI.

Without external knowledge that info:doi/10./xxx is a URI, I can
only guess.

(Caveat: I have no idea what rft_id, etc, means, so maybe that changes
the meaning of what you are saying from how I read it.)

-Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpRKlTtYU7Wa.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Erik Hetzner
At Fri, 27 Mar 2009 20:56:42 -0400,
Ross Singer wrote:

 So, in a what is probably a vain attempt to put this debate to rest, I
 created a partial redirect PURL for sudoc:

 http://purl.org/NET/sudoc/

 If you pass it any urlencoded sudoc string, you'll be redirected to
 the GPO's Aleph catalog that searches the sudoc field for that string.

 http://purl.org/NET/sudoc/E%202.11/3:EL%202

 should take you to:
 http://catalog.gpo.gov/F/?func=find-cccl_term=GVD%3DE%202.11/3:EL%202

 There, Jonathan, you have a dereferenceable URI structure that you
 A) don't have to worry about pointing at something misleading
 B) don't have to maintain (although I'll be happy to add whoever as a
 maintainer to this PURL)

 If the GPO ever has a better alternative, we just point the PURL at it
 in the future.

Beautiful work, Ross. Thank you.

best,
Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpC8fHWXKSFo.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Erik Hetzner
At Mon, 30 Mar 2009 10:12:39 -0400,
Ray Denenberg, Library of Congress wrote:
 Leaving aside religious issues I just want to be  sure we're clear on one
 point: the work required for the info URI process is exactly the amount of
 work required, no more no less.  It forces you to specify clear syntax and
 semantics, normalization (if applicable), etc.  If you go a different route
 because it's less work, then you're probably avoiding doing work that needs
 to be done.

Reading over your previous message regarding mapping SuDocs syntax to
URI syntax, I completely agree about the necessity of clarifying these
rules.

But I was referring to the bureaucratic overhead (little thought it
may be) in registering an info: URI. This overhead may or may not be
useful, but it is there, including a submission process, internal
review,  public comments (according the draft info URI registry
policy).

-Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpz1Vry1WFt3.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Erik Hetzner
At Mon, 30 Mar 2009 13:58:04 -0400,
Jonathan Rochkind wrote:
 
 It's interesting that there are at least three, if not four, viewpoints 
 being represented in this conversation.
 
 The first argument is over whether all identifiers should be resolvable 
 or not.  While I respect the argument that it's _useful_ to have 
 resolvable (to something) identifiers , I think it's an unneccesary 
 limitation to say that all identifiers _must_ be resolvable. There are 
 cases where it is infeasible on a business level to support 
 resolvability.  It may be for as simple a reason as that the body who 
 actually maintains the identifiers is not interested in providing such 
 at present.  You can argue that they _ought_ to be, but back in the real 
 world, should that stand as a barrier to anyone else using URI 
 identifiers based on that particular identifier system?  Wouldn't it be 
 better if it didn't have to be?

 [ Another obvious example is the SICI -- an identifier for a particular 
 article in a serial. Making these all resolvable in a useful way is a 
 VERY non-trivial exersize. It is not at all easy, and a solution is 
 definitely not cheap (DOI is an attempted solution; which some 
 publishers choose not to pay for; both the DOI fees and the cost of 
 building out their own infrastructure to support it). Why should we be 
 prevented from using identifiers for a particular article in a serial 
 until this difficult and expensive problem is solved?]
 
 So I don't buy that all identifiers must always be resolvable, and that 
 if we can't make an identifier resolvable we can't use it. That excludes 
 too much useful stuff.

I don’t actually think that there is anybody who is arguing that all
identifiers must be resolvable. There are people who argue that there
are identifiers which must NOT be resolvable; at least in their basic
form. (see Stuart Weibel [1]).
 
 […]

best,
Erik

1. http://weibel-lines.typepad.com/weibelines/2006/08/uncoupling_iden.html
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpuKdGTC0Mj7.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-03-30 Thread Erik Hetzner
At Mon, 30 Mar 2009 15:52:10 -0400,
Jonathan Rochkind wrote:
 
 Erik Hetzner wrote:
 
  I don’t actually think that there is anybody who is arguing that all
  identifiers must be resolvable. There are people who argue that there
  are identifiers which must NOT be resolvable; at least in their basic
  form. (see Stuart Weibel [1]).
 
 There are indeed people arguing that, Erik, on this very list. Like,
 in the email I responded to (did you read that one?). That's why I
 wrote what I did, man! You know I'm the one who cited Stu's argument
 first on this list! I am aware of his arguments. I am aware of
 people arguing various things on this issue.

My apologies for missing Andrew’s argument and not pointing out that
you had originally pointed to Stuart’s argument.
 
 But when did someone suggest that all identifiers must be resolvable? 
 When Andrew argued that:
 
  Having unresolvable URIs is anti-Web since the Web is a hypertext
  system where links are required to make it useful.  Exposing
  unresolvable links in content on the Web doesn't make the Web 
  more useful.

 Okay, I guess he didn't actually SAY that you should never have
 non-resolvable identifiers, but he rather strongly implied it, by
 using the anti-Web epithet.

Given Andrew’s later response, I would like to restate my previous
argument:

I don’t [] think that there is anybody who is +seriously+ arguing that
all identifiers must be resolvable +to be useful as identifiers+.

best,
Erik
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgps01lTF1mj0.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Erik Hetzner
At Fri, 27 Mar 2009 15:36:43 -0400,
Jonathan Rochkind wrote:
 
 Thanks Ray.
 
 Oh boy, I don't know enough about SuDoc to describe the syntax rules 
 fully. I can spend some more time with the SuDoc documentation (written 
 for a pre-computer era) and try to figure it out, or do the best I can.  
 I mean, the info registration can clearly point to the existing SuDoc 
 documentation and say one of these -- but actually describing the 
 syntax formally may or may not be possible/easy/possible-for-me-personally.
 
 I can't even tell if normalization would be required or not. I don't 
 think so.  I think SuDocs don't suffer from that problem LCCNs did to 
 require normalization, I think they already have consistent form,  but 
 I'm not certain.
 
 I'll see what I can do with it. 
 
 But Ray, you work for 'the government'.   Do you have a relationship 
 with a counter-part at GPO that might be interested in getting involved 
 with this?

Hi Jonathan -

Obviously I don’t know your requirements, but I’d like to suggest that
before going down the info: URI road, you read the W3C Technical
Architecture Group’s finding ‘URNs, Namespaces and Registries’ [1].

| Abstract

| This finding addresses the questions When should URNs or URIs with
| novel URI schemes be used to name information resources for the
| Web? and Should registries be provided for such identifiers?. The
| answers given are Rarely if ever and Probably not. Common
| arguments in favor of such novel naming schemas are examined, and
| their properties compared with those of the existing http: URI
| scheme.

| Three case studies are then presented, illustrating how the http:
| URI scheme can be used to achieve many of the stated requirements
| for new URI schemes.

best,
Erik Hetzner

1. http://www.w3.org/2001/tag/doc/URNsAndRegistries-50
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpvBsZoxJDPh.pgp
Description: PGP signature


Re: [CODE4LIB] registering info: uris?

2009-03-27 Thread Erik Hetzner
At Fri, 27 Mar 2009 17:18:24 -0400,
Jonathan Rochkind wrote:
 
 I am not interested in maintaining a sudoc.info registration, and 
 neither is my institution, who I wouldn't trust to maintain it (even to 
 the extent of not letting the DNS registration expire) after I left.  I 
 think even something as simple as this really needs to be committed to 
 by an organization.  So yeah, even willing to take on the 
 responsibility of owning that domain until such time as something useful 
 can be done with it, I do not have, and to me that seems like a 
 requirement, not just a nice to have.

I see your point. I believe that registering a domain would be less
work than going through an info URI registration process, but I don’t
know how difficult the info URI registration process would be (thus
bringing the conversation full circle). [1]
 
 But it certainly is another option. I feel like most people have the
 _expectation_ of http resolvability for http URIs though, even
 though it isn't actually required. If you want there to be an actual
 http server there at ALL, even one that just responds to all
 requests with a link to the SuDoc documentation, that's another
 thing you need.

I think there is a strong expectation that if I resolve a URI, I do
not end up with a domain squatter. Otherwise I am not so sure what is
expected when using an HTTP URI whose primary purpose is
identification, not dereferencing. Personally I would be happy to get
either a page telling me to check back later [2], or nothing at all.

best,
Erik Hetzner

1. My last word on this. Because I am already beating a dead horse, I
have put it in a footnote. For $100 and basically no time at all you
can have 10 years of sudoc.info. If it takes an organization more than
2 or 3 hours of work to register an info: URI, then domain
registration is a better deal, as I see it.

2. http://lccn.info/2002022641
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpLGEdroPmog.pgp
Description: Digital Signature


Re: [CODE4LIB] Linux Public Computers - time and ticket reservation system

2009-01-05 Thread Erik Hetzner
At Mon, 5 Jan 2009 11:02:31 -0500,
Darrell Eifert deif...@hampton.lib.nh.us wrote:

 Actually, I meant 'free' in both senses, but mostly in the sense of
 'free of charge'.

Thanks for the clarification. In that case I have to agree with Karen.
Free (as in beer) software tends to be a property that results from
the principles of free (as in speech) software, but it is not an goal
in itself of most free/open source software developers.

 I hate to be blunt, but I think it's pretty safe to say that Ubuntu,
 Koha, GIMP, OpenOffice, Joomla and even the option of Linux itself
 would never exist or have gained traction and a developer base if
 these products were not freely available.

Probably - but they certainly never would have gained a developer base
if they were not free in the sense of having the source code
available, and allowing modifications. Freedom is more important to
community building than giving the software away without cost.

 Groovix and Userful are selling proprietary public-use computer
 management packages at a higher cost than their XP equivalents. If
 an open source LTSP solution were available under Linux (as in the
 Edubuntu package for schools) I would be much happier about
 recommending Linux as a solution for public-use computers in small
 to medium-sized independent public libraries.

 Again, I would invite those interested in providing help on this
 project to look at the feature list of 'Time Limit Manager' from
 Fortres -- that's what I want in an LTSP package. (As an analogy,
 remember that Koha was once just an idea floating around in some
 idealistic New Zealander's head.)

 http://www.fortresgrand.com/products/tlm/tlm.htm

Groovix claims to be GPLed, though they do not make it easy to get the
software. Here is some info:

http://wiki.groovix.org/index.php?title=GroovixSoftwareInstaller

best, Erik


pgpzjohkt6SmU.pgp
Description: PGP signature


Re: [CODE4LIB] COinS in OL?

2008-12-01 Thread Erik Hetzner
At Mon, 1 Dec 2008 08:15:24 -0800,
Raymond Yee wrote:

 Having COinS embedded in the Open Library would be useful.  Zotero would
 have made use of such COinS -- but because they were absent, a custom
 translator was written to grab the bibliographic metadata from OL.

Zotero also supports Unapi, which in my opinion is a much better
system for getting bibliographic metadata from web sites than COinS.

best,
Erik Hetzner
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgptmLUp83fHi.pgp
Description: PGP signature


Re: [CODE4LIB] djatoka

2008-11-18 Thread Erik Hetzner
At Tue, 18 Nov 2008 06:13:46 -0500,
Ed Summers [EMAIL PROTECTED] wrote:
 Thanks for bringing this up Erik. It really does seem to be
 preferable to me to treat these tiles as web resources in their own
 right, and to avoid treating them like resources that need to be
 routed to with OpenURL. It is also seems preferable to leverage
 RESTful practices like using the Accept header.

 I wonder if it would improve downstream cache-ability to push parts of
 the query string into the path of the URL, for example:

   http://an.example.org/ds/CB_TM_QQ432/4/0/899/1210/657/1106

 Which could be documented with a URI template [1]:

   http://an.example.org/ds/{id}/{level}/{rotate}/{y}/{x}/{height}/{width}

 I guess I ought to read the paper (and refresh my knowledge of http
 caching) to see if portions of the URI would need to be optional, and
 what that would mean.

 Still, sure is nice to see this sort of open source work going on
 around jpeg2000. My nagging complaint about jpeg2000 as a technology
 is the somewhat limited options it presents tool wise ... and djatoka
 is certainly movement in the right direction.

It might improve cache-ability: my understanding (not checking sources
here) is that many caches do not cache GETs to URIs with query parts,
although it is allowed. However: query parameter order does matter, so
an explicitly ordered URI template could certainly prevent the problem
of:

http://example.org/?a=1b=2

being considered a different resource than:

http://example.org/?b=2a=1

If you read rest-discuss, there have been discussions of image
manipulation with URI query parameters/paths.

http://article.gmane.org/gmane.comp.web.services.rest/6699
http://article.gmane.org/gmane.comp.web.services.rest/8167

There seem to be advantages to both methods (query parameters/paths).

There is the further possibility of using path parameters [1], which
seems a pretty natural fit, but not widely used:

http://an.example.org/ds/{id};level={level};rotate={rotate};y={y};x={x};height={height};width={width}

Additionally, I think that reading about how Amazon does (mostly) the
same thing would be useful:

http://www.aaugh.com/imageabuse.html

I think that the library community could contribute to possible work
in standardizing, to some extent, image manipulation with URIs; but I
do feel that using OpenURL will slow or prevent uptake.

best,
Erik Hetzner

1. http://www.w3.org/DesignIssues/Axioms.html#matrix
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgp8rzRb0n1cE.pgp
Description: PGP signature


Re: [CODE4LIB] djatoka

2008-11-17 Thread Erik Hetzner
At Fri, 14 Nov 2008 06:10:45 -0500,
Birkin James Diana [EMAIL PROTECTED] wrote:
 
 Yesterday I attended a session of the DLF Fall Forum at which Ryan  
 Chute presented on djatoka, the open-source jpeg2008 image-server he  
 and Herbert Van de Sompel just released.
 
 It's very cool and near the top of my crowded list of things to play  
 with.
 
 If any of you have had the good fortune to experiment with it or  
 implement it into some workflow, get over to the code4libcon09  
 presentation-proposal page pronto! And if you're as jazzed about it as  
 I am, and know it'll be as big in our community as I think it will,  
 consider a pre-conf proposal, too.

Hi -

This is a very cool tool. I am glad to see JPEG2k stuff hitting the
open source world. Very nice!

That said -

It would be nice if somebody could make this work without OpenURL.

Frankly I would much prefer the normal URI:

http://an.example.org/ds/CB_TM_QQ432?level=4rotate=0y=899x=1210h=657w=1106 
[1]

to the OpenURL:

http://an.example.org/djatoka/resolver?
url_ver=Z39.88-2004 
rft_id=info:lanl-repo/ 
svc_id=info:lanl-repo/svc/getRegion 
svc_val_fmt=info:ofi/fmt:kev:mtx:jpeg2000 
svc.format=image/jpeg 
svc.level=4 
svc.rotate=0 
svc.region=899,1210,657,1106

and - so does the web, generally, consider that nobody uses OpenURL.

I notice also that the example ajax tool put a duplicate URI box in
the lower left hand corner for permanent URIs. It would be nice to
have a ‘bookmark this’ type link - as in google maps, if the current
bookmarkable URI is not going to be reflected in the location bar.

best,
Erik

1. I have left out the HTTP Accept header, part of the HTTP request
but not part of the URI which is a more expressive replacement for the
svc.format=image/jpeg parameter.


pgpUeBs0lMEnp.pgp
Description: PGP signature


Re: [CODE4LIB] Code4lib mugs?

2008-11-03 Thread Erik Hetzner
At Mon, 3 Nov 2008 13:31:18 -0500,
jean rainwater [EMAIL PROTECTED] wrote:

 I think the mugs are a great idea -- and thank you for your
 sponsorship!!!

For myself, all the logoified travel mugs, t-shirts, usb keys, etc. I
get clutter up my home until I finally get around to getting rid of
them. Why not use the upwards of $700 that (my estimate) this will
cost to sponsor another scholarship, or just to lower the cost of
attendance?

best, Erik Hetzner
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpFHAk8yKe39.pgp
Description: PGP signature


Re: [CODE4LIB] code to guide installation of software?

2008-10-09 Thread Erik Hetzner
At Thu, 9 Oct 2008 14:05:06 -0400,
Ken Irwin [EMAIL PROTECTED] wrote:
 
 Hi folks,
 
 I've got a homegrown piece of software that I'll be presenting at a 
 conference in a few weeks (to track title  call-number request 
 histories using III's InnReach module). I'm trying to package it up in 
 such a way that other users will be able to use the software too, and 
 I've never done this before.
 
 Is there any open-source or otherwise freely-available software to 
 handle the installation of a LAMP-type product:
 
 - displaying readme type information until everything's set up
 - creating databases
 - creating data tables (in this case, with a dynamic list of fields 
 depending on some user input)
 - loading up some pre-determined data into database tables
 - editing the config file variables
 
 I could make this up myself, but I wonder if someone has genericized 
 this process. (I'm particularly concerned about how to effectively 
 pre-load the data tables, not assuming the user has command-line mysql 
 access.)

This is pretty generic advice, but you should have a look at Karl
Fogel’s book, Producing open source software, available online [1],
particularly the chapter on ‘Packaging’. This provides a somewhat
high-level view of the mechanics of packaging free software for
release. It will not help with writing scripts to set up databases,
which you will probably have to do by hand.

best,
Erik Hetzner

1. http://producingoss.com/ 
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpW4inUYxjwK.pgp
Description: PGP signature


Re: [CODE4LIB] anyone know about Inera?

2008-07-14 Thread Erik Hetzner
At Sat, 12 Jul 2008 10:46:06 -0400,
Godmar Back [EMAIL PROTECTED] wrote:

 Min, Eric, and others working in this domain -

 have you considered designing your software as a scalable web service
 from the get-go, using such frameworks as Google App Engine? You may
 be able to use Montepython for the CRF computations
 (http://montepython.sourceforge.net/)

 I know Min offers a WSDL wrapper around their software, but that's
 simply a gateway to one single-machine installation, and it's not
 intended as a production service at that.

Thanks for the link to montepython. It looks like it might be a good
tool for me to learn more about machine learning.

As for my citation metadata extractor, once the training data is
generated it would be trivial to scale it; there is no shared state.
All that is really needed is an implementation of the Viterbi
algorithm,  there is one (in pure Python) on the wikipedia page; it
is about 20 lines of code. So presumably it could be scaled on the
Google app engine pretty easily. But it could be scaled on anything
pretty easily; all you need is a load balancer and however many
servers are necessary (not many, I would think).

best,
Erik Hetzner
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpDckRg5SWMS.pgp
Description: PGP signature


Re: [CODE4LIB] anyone know about Inera?

2008-07-11 Thread Erik Hetzner
At Fri, 11 Jul 2008 14:55:18 -0500,
Steve Oberg [EMAIL PROTECTED] wrote:
 
 One example:
 
 Here's the citation I have in hand:
 
 Noordzij M, Korevaar JC, Boeschoten EW, Dekker FW, Bos WJ, Krediet RT et al.
 The Kidney Disease Outcomes Quality Initiative (K/DOQI) Guideline for Bone
 Metabolism and Disease in CKD: association with mortality in dialysis
 patients. American Journal of Kidney Diseases 2005; 46(5):925-932.
 
 Here's the output from ParsCit. Note the problem with the article title:

 […]

The output is a little different from what I get from the parsCit web
service. The parsCit authors recently published a new paper on a new
version of their systems with a new engine, which you might want to
look at [1].

 There's more but basically it isn't accurate enough. It's very good but not
 good enough for what I need at this juncture.  OpenURL resolvers like SFX
 are generally only as good as the metadata they are given to parse.  I need
 a high level of accuracy.
 
 Maybe that's a pipe dream.

I doubt that the software provided by Inera performs better than
parsCit. Inera does find a DOI for that citation but that is not
nearly so hard as determining which parts of a citation are which.
parsCit is pretty cutting edge  provides some of the best numbers I
have seen. The Flux-CiM system [2] also has pretty good numbers, but
the code for it is not available. I’ve also done a little bit of work
on this, which you might want to have a look at. [3]

One of the problems may be that the parsCit you are dealing with has
been trained on the Cora dataset of computer science citations. It is
a reasonably heterogeneous dataset of citations but it doesn’t have a
lot that looks like that health sciences format. If your citations are
largely drawn from the health sciences you might see about training it
on a health sciences dataset; you will probably get much better
results.

best,
Erik Hetzner

1. Isaac G. Councill, C. Lee Giles, Min-Yen Kan. (2008) ParsCit: An
open-source CRF reference string parsing package. In Proceedings of
the Language Resources and Evaluation Conference (LREC 08), Marrakesh,
Morrocco, May. Available from http://wing.comp.nus.edu.sg/parsCit/#p

2. Eli Cortez C. Vilarinho, Altigran Soares da Silva, Marcos André
Gonçalves, Filipe de Sá Mesquita, Edleno Silva de Moura. FLUX-CIM:
flexible unsupervised extraction of citation metadata. In Proceedings
of the 8th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2007),
pp. 215-224.

3. A simple method for citation metadata extraction using hidden
Markov models. In Proc. of the Joint Conf. on Digital Libraries (JCDL
2008), Pittsburgh, Pa., 2008.
http://gales.cdlib.org/~egh/hmm-citation-extractor/
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgp64luKWEnmY.pgp
Description: PGP signature


[CODE4LIB] Project Manager position at CDL in digital preservation group

2008-07-10 Thread Erik Hetzner
(Forwarded; please direct inquiries to [EMAIL PROTECTED])

UNIVERSITY OF CALIFORNIA, CALIFORNIA DIGITAL LIBRARY

TITLE: Digital Preservation Services Manager

CATEGORY: Full-Time

SALARY: Salary commensurate with qualifications and experience. 
Excellent benefits.

TO APPLY: http://jobs.ucop.edu/applicants/Central?quickFind=52447

POSITION DESCRIPTION: 
Want to be part of a dynamic team that is working to preserve digital
information for future generations? At the California Digital Library
(CDL), we've developed a world-class program to preserve digital
material that supports the University of California's research,
teaching, and learning mission and you can be a part of it. A key
member of the team is the Digital Preservation Services Manager --
reporting to the Director of the Digital Preservation Program the
Manager is responsible for the day-to-day management of digital
preservation services (production and development) through project
management, the provision of support services (whether offered in
person or online), and liaison with digital preservation service
providers and support staff. In addition, the Services Manager will be
responsible for translating experience of users' needs and perceptions
of system capabilities in a manner that informs further refinement and
extension of the digital preservation technology and service
infrastructure.

This is an ideal opportunity for someone with solid people skills and
a passion for working in a collaborative and dynamic environment. 

The California Digital Library (CDL) supports the assembly and
creative use of the world's scholarship and knowledge for the UC
libraries and the communities they serve. In partnership with the UC
libraries, the California Digital Library established the digital
preservation program to ensure long-term access to the digital
information that supports and results from research, teaching and
learning at UC.

JOB REQUIREMENTS: 
Bachelor's degree in the social sciences, public administration,
library and information science or a related field and at least three
years' relevant experience with development or delivery of online
information services in educational, digital preservation, library,
research, and/or cultural heritage settings or an equivalent
combination of education and experience.

Demonstrated experience to plan, evaluate, budget for and manage
complex projects from their inception through to their final delivery.

Plans projects and assignments and monitors performance according to
priorities as demonstrated by regularly meeting established deadlines
in an environment of multiple projects and changing priorities.

Strong logic and quantitative reasoning skills as demonstrated by
ability to review and assess a range of variables to define key
issues, evaluate reasonable alternatives and translate findings into
recommended changes, actions or strategies.

Proven experience with and general understanding of the academic user
community and the digital library/scholarly information services
domain.

Demonstrated experience working with user community and
technology/programming staff to build use cases, functional
requirements and user interface design.

Excellent written and verbal communication skills as demonstrated by
the ability to understand and articulate technical ideas and issues at
a conceptual level and explain them clearly and concisely to
non-technical staff.

Demonstrated ability to operate under general direction, able to
develop creative solutions to problems, and tackle issues in a
self-motivated manner in a service-oriented geographically distributed
team environment.

Demonstrated ability to plan, evaluate, budget for and manage complex
projects from their inception through to their final delivery.

Please don't hesitate to contact me if you have any questions about
the position.

Patricia Cruse
Director, Digital Preservation Program
California Digital Library
University of California
510/987-9016 


pgpfUIW751boR.pgp
Description: PGP signature


Re: [CODE4LIB] what's the best way to get from Portland to San Francisco on Feb 28?

2008-02-21 Thread Erik Hetzner
At Wed, 20 Feb 2008 19:35:45 -0800,
Reese, Terry [EMAIL PROTECTED] wrote:
 
 You'll want to fly. On the West Coast, taking the train is a bit of
 a crap shoot and wouldn't advise it unless you had a day between
 when you are suppose to arrive and when you need to arrive. The few
 times I've taken Amtrak on the West coast between Seattle and Los
 Angelos, I've never been on time. I've been anywhere between 5 hours
 to one day late depending the distance needed to travel. In fact,
 given my past experience, if I wasn't going to fly -- I would drive.
 It will take you approximately 12-13 hours to drive down I-5 from
 Portland to San Francisco. By train, almost twice as long.

Terry is right. Trains here are useless. The Greyhound will take less
time (!), but lowest cost fare is only around half the cost of a plane
ride, and refundable is comparable.

If you can get a car or hitch a ride I-5 is not a great trip but fast
enough: 1/101 takes a bit longer but at least is a pretty nice drive
through CA (don’t know about Oregon).

If you do fly, Southwest flies from Portland to Oakland for a good
price,  Oakland is just a Bart ride from SF.

best,
Erik Hetzner
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpvO2hUpLHSQ.pgp
Description: PGP signature


Re: [CODE4LIB] theinfo.org: for people who work with big data sets

2008-01-15 Thread Erik Hetzner
At Tue, 15 Jan 2008 12:08:23 -0800,
Aaron Swartz [EMAIL PROTECTED] wrote:

 Hi code4libbers! As part of my work on Open Library, I've been doing
 what I expect a lot of you find yourself doing: collecting big batches
 of MARC records, testing algorithms for processing them, building
 interesting ways to visualize them. And what I've found is that while
 the community of other people doing this in libraries is really
 valuable, I also have a lot to learn from people who do this sort of
 thing with other types of data. So I'm trying to build a
 code4lib-style community around people who work with large data sets
 of all kinds:

 http://theinfo.org/

 I hope that you'll take a look and join the mailing lists and get
 involved. I think that there's a lot we could do together.

Hi Aaron  al.

Looks like a great project. Thanks also for plugging the WARC format.
I added a bit to the wiki on this.

I have a bit of trouble differentiating this from the Linking Open
Data project[1]. Perhaps some info on the wiki about this would be
helpful.

best,
Erik Hetzner

1. http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpGvb167wtCL.pgp
Description: PGP signature


Re: [CODE4LIB] open source chat bots?

2007-12-03 Thread Erik Hetzner
At Mon, 3 Dec 2007 10:14:29 -0500,
Andrew Nagy [EMAIL PROTECTED] wrote:

 Hello - there was quite a bit of talk about chat bots a year or 2
 back. I was wondering if anyone knew of an open source chat bot that
 works with jabber?

There is a program called bitlbee that implements a jabber/aim/etc to
irc gateway. If you used this you might then be able to use the
vast universe of free/libre IRC bots.

best,
Erik Hetzner

;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


pgpyDwijXzOHE.pgp
Description: PGP signature


Re: [CODE4LIB] code.code4lib.org

2007-08-13 Thread Erik Hetzner
At Mon, 13 Aug 2007 12:25:58 -0400,
Gabriel Farrell [EMAIL PROTECTED] wrote:

 In #code4lib today we discussed for a bit the possibility of setting up
 something on code4lib.org for code hosting.  The project that spurred
 the discussion is Ed Summer's pymarc.  The following is what I would
 like to see:

 * projects live at code.code4lib.org, so pymarc, for example, would be
   at code.code4lib.org/pymarc
 * svn for version control
 * trac interface for each
 * hosted at OSU with the rest of code4lib.org, for now

What will this offer that sf.net, codehaus.org, nongnu.org,
savannah.gnu.org, code.google.com, gna.org, belios.de, etc. don’t? Why
not simply link to
http://en.wikipedia.org/wiki/Comparison_of_free_software_hosting_facilities
and let people decide which they prefer?

Other people mentioned the sharing of code snippets; a wiki works best
for sharing code snippets, examples,  single file source. See
http://emacswiki.org/ for a lively example.

best,
Erik Hetzner


pgpaO9rEiQ83t.pgp
Description: PGP signature