from:"Jonathan Rochkind"

While I agree with ross in general about suggesting technical solutions 
without suggesting how they are going to be maintained -- agree very 
strongly -- and would further re-emphasize that it's improtant to 
remember that ALL software installations are living organisms 
(Ranganthan represent!), and need ongoing labor not just initial install 
labor


I don't agree with the conclusion that the _only_ way to do this is with 
a central organization or my organization which has shown

 commitment through z

I think it IS possible to run things sustainably with volunteer 
decentralized not-formal-organization labor.


But my experience shows that it _isn't_ likely to work with ONE PERSON 
volunteering.  It IS more likely to work with an actual defined 
collective, which feels collective responsibility for replacing 
individual members when they leave and maintaining it's collective 
persistence.


Is that foolproof? No.  But it doens't make it foolproof to incorporate 
and have a 'central organization' (still need labor, paid or unpaid), or 
to have an existing organization that commits to it (can always change 
their mind, or not fulfill their commitments even without actually 
changing their mind). There are plusses and minuses to both.


I am a firm believer in code4lib's dentralized volunteer 
community-not-organization nature.  I may be becoming a minority, it 
seems like everyone else wants code4lib to be Official?  There are 
plusses and minuses to both.


But either way, I don't think officiality is EITHER neccesary NOR 
sufficient to ensure sustainability of tech projects (or anything else).


But i fully agree with rsinger that setting up a new tech project 
_without_ thinking about ongoing sustainability is foolhardy, unless 
it's just a toy you don't mind if it disappears when the originator 
loses interest.


On 12/4/2012 11:08 AM, Ross Singer wrote:

Shaun, I think you missed my point.

Our Drupal (and per Tom's reply, Wordpress -- ...and I'm going to
take a stab in the dark and throw MediaWiki instance into the pile)
is, for all intents and purposes, unmaintained because we have no in
charge of maintaining it.  Oregon State hosts it, but that's it.

Every year, every year, somebody proposes we ditch the diebold-o-tron
for something else (Drupal modules, mediawiki plugins, OCS, ... and
most recently Easy Chair), yet nobody has ever bothered to do
anything besides send an email of what we should use instead.
Because that requires work and commitment.

What I'm saying is, we don't have any central organization, and thus
we have no real sustainable way to implement locally hosted services.
The Drupal instance, the diebold-o-tron (and maybe Mediawiki) are
legacies from when several of us ran a shared server in a colocation
facility.  We had skin in the game.  And then our server got hacked
because Drupal was unpatched (which sucked) and we realized we
probably needed to take this a little more seriously.

The problem was, though, when we moved to OSU for our hosting, we
lost any power to do anything for ourselves and since we no longer
had to (nor could) maintain anything, all impetus to do so was lost.

To be clear, when we ran all these services on anvil, that wasn't
sustainable either!  We simply don't have the the organization or
resources to effectively run this stuff by ourselves.  That's why I'm
really not interested in hearing about some x we can run for y if
it's not backed up with and my organization which has shown
commitment through z will take on the task of doing all the work on
this.

-Ross.

On Dec 4, 2012, at 10:41 AM, Shaun Ellis sha...@princeton.edu
wrote:


Tom, can you post the plugin to Code4Lib's github so we can have a
crack at it?

Ross, I'm not sure how many folks on this list were aware of the
Drupal upgrade troubles.  Regardless, I don't think it's
constructive to put new ideas on halt until it gets done.  Not
everyone's a Drupal developer, but they could contribute in other
ways.

-Shaun

On 12/4/12 10:27 AM, Tom Keays wrote:

On Tue, Dec 4, 2012 at 9:53 AM, Ross Singer
rossfsin...@gmail.com wrote:


Seriously, folks, if we can't even figure out how to upgrade
our Drupal instance to a version that was released this decade,
we shouldn't be discussing *new* implementations of *anything*
that we have to host ourselves.



Not being one to waste a perfectly good segue...

The Code4Lib Journal runs on WordPress. This was a decision made
by the editorial board at the time (2007) and by and large it was
a good one. Over time, one of the board members offered his
technical expertise to build a few custom plugins that would
streamline the workflow for publishing the journal. Out of the
box, WordPress is designed to publish a string of individual
articles, but we wanted to publish issues in a more traditional
model, with all the issues published at one time and arranged in
the issue is a specific order. We could (and have done) all this
manually, but having the plugin has

Re: [CODE4LIB] Choosing fora. was: Proliferation of Code4Lib Channels


On 12/4/2012 12:10 PM, MJ Ray wrote:


Really? I hoped if I wanted to do serious hacking, I could clone it on
git.software.coop and send a pull request.  If you use github *and
insist everyone else does* then you lose all the decentralised networked
collaboration benefits of git and it becomes a worse-and-better CVS.


A pull request is a feature of github.com.  There is no feature of 
git-the-software called a pull request.


Which of course doens't stop you from sending an email requesting a 
pull. A pull, including from decentralized third party repos, is a 
feature of git.


But yes, if you get used to the features of a particular free service, 
you get locked into that particular free service.


This is certainly part of the overall cost/benefit of using free hosted 
services.

Re: [CODE4LIB] Choosing fora. was: Proliferation of Code4Lib Channels

Okay, I guess that is a feature. It generates a plain text file you can 
send to someone else via email; the person can respond by taking manual 
action on their git command line.


Definitely not the github pull requests people are used to.

On 12/4/2012 1:16 PM, MJ Ray wrote:

Jonathan Rochkind rochk...@jhu.edu

On 12/4/2012 12:10 PM, MJ Ray wrote:

Really? I hoped if I wanted to do serious hacking, I could clone it on
git.software.coop and send a pull request.  If you use github *and
insist everyone else does* then you lose all the decentralised networked
collaboration benefits of git and it becomes a worse-and-better CVS.


A pull request is a feature of github.com.  There is no feature of
git-the-software called a pull request.


I don't think that's correct.  GitHub was only launched in April 2008,
but here's a pull request from 2005:
http://lkml.indiana.edu/hypermail/linux/kernel/0507.3/0869.html

Here's the start of the relevant page in the git software manual:

[quote]
NAME
git-request-pull - Generates a summary of pending changes

SYNOPSIS
git request-pull [-p] start url [end]

DESCRIPTION
Summarizes the changes between two commits to the standard output, and
includes the given URL in the generated summary.
[/quote]


Which of course doens't stop you from sending an email requesting a
pull. A pull, including from decentralized third party repos, is a
feature of git.


It sucks that github doesn't accept emails of such git pull requests
and do anything useful with them.  Ignoring the huge potential of
email coordination seems like missing a big feature of git.


But yes, if you get used to the features of a particular free service,
you get locked into that particular free service. [...]


If one is locked in, that means it has an exit cost, so it's no longer
a free service.  The piper might just not need payment yet.

Hope that explains,

Re: [CODE4LIB] Help with WordPress for Code4Lib Journal


I'd check out the links under Bootcamp here:

https://help.github.com/

On 12/4/2012 5:18 PM, Mark Pernotto wrote:

As I'm clearly not well-versed in the goings-on of GitHub, I've
'forked' a response, but am not sure it worked correctly.

I've zipped up and sent updates to Tom.  If anyone could point me in
the direction of a good GitHub tutorial (for contributing to projects
such as these - the 'creating an account' part I think I have down),
I'd appreciate it.

Thanks,
Mark



On Tue, Dec 4, 2012 at 1:43 PM, Tom Keays tomke...@gmail.com wrote:

Let's have mine be the canonical version for now. It will be too confusing
to have two versions that don't have an explicit fork relationship.

https://github.com/tomkeays/issue-manager

Tom

On Tue, Dec 4, 2012 at 1:56 PM, Chad Nelson chadbnel...@gmail.com wrote:


Beat me by one minute Tom!

And here it is in code4lib github

https://github.com/code4lib/IssueManager


On Tue, Dec 4, 2012 at 1:47 PM, Tom Keays tomke...@gmail.com wrote:


On Tue, Dec 4, 2012 at 1:01 PM, Shaun Ellis sha...@princeton.edu

wrote:



You can upload it to your account and then someone with admin rights to
Code4Lib can fork it if they think our Code4Lib Journal custom code

should

be a repo there.  Doesn't really matter if they do actually. I think

for

debugging, it's best to point folks to the actual code the journal is
running, which was forked from the official one on the Codex, right?



It was written for the Journal and originally kept in a Google Code repo
(this is before Github became the de facto). After the author left the
journal, he did a couple of updates which he uploaded to the WP Codex,

but

nothing for a few years.

Anyway, here it is:

https://github.com/tomkeays/issue-manager

Re: [CODE4LIB] Library event systems and using your API talents for good

On this thread in general, people may be interested in a previous 
Code4Lib Journal article on using Google Calendars via it's API to embed 
library open hours information on a website. (Sorry if this has 
already been mentioned in this thread!)


http://journal.code4lib.org/articles/46

It occurs to me that such could also potentially be used for library 
events, I'm thinking? You'd be essentially using Google Calendar for 
it's UI for entering and managing events (and perhaps taking advantage 
of it's iCal feed for end-users that want such?), while building your 
own actual display UI, built on the Google Calendars API. It's be free, 
would be one advantage.




On 12/2/2012 10:51 AM, Michael Schofield wrote:

This will be brief and full of typos (on my phone during breakfast). I've only 
been with my current library for the last year, but they/we have been using an 
event calendar called Helios. It is cheap and working with it is similar to 
Wordpress. Since I've been here, we purchased Program Registration (an iii 
product). Our public and reference staff really didn't like using it (can't 
blame them), so we hacked-up Helios to be the front-end for our program 
registration backend (which only really matters IF an event requires actual 
registration).

Anyway, just a simple plug for Helios if only because we found it to be super 
malleable. Also, the support from the main guy has been super. I think the URL 
is refreshmy.com, but I'm on my phone and that's from memory.


Sent from my iPhone

On Dec 2, 2012, at 10:35 AM, Tom Keays tomke...@gmail.com wrote:


I've been disappointed by event management/calendaring systems in general.
I think there are a number of common needs that libraries all share.

Calendar systems -- scheduling single instance or repeating instance events
seems to be the one thing you can find in a system. Basic
metadata/filtering parameters should (and usually do) include: date, time,
location, description. There's variation in how rich this metadata is; some
include permutations on address, campus information, mapping options, etc.;
some include html options for the description, such as allowing links or
images.

Event registration -- an added feature is the ability to allow users to
register for an event and for event organizers to process that data.  You
don't want to have to maintain a separate registration system. Outside the
scope of LibraryThing's Event API, except possibly to replicate
registration links so users can sign up from within LT.

Syndication -- Jon Udell spent much of 2009 and 2010 documenting his
efforts to find and then build a calendaring system that would aggregate
existing sources of calendar data, the goal being reuse rather than
replication. [1]  His specific objective was to create a shared community
calendar [2] and along the way, he explored the limitations of RSS and iCal
data. Once such data was captured by a calendar aggregator, it could then
be resyndicated, giving users a single source for the entire community.
(Udell has been less public since 2010, so I lost track of where this has
been going.)

[1] http://radar.oreilly.com/2010/08/lessons-learned-building-the-e.html
[2] http://elmcity.cloudapp.net/

Embedded calendar data -- Also related to syndication is the idea of
including calendar metadata in a format on a web page that can be indexed
by search engines and directly consumed by users via browser plugins and
the like. The hCalendar microformat [3] was an attempt to embed iCal
calendar data into event listings. When RDFa had its brief accendency a
couple of years ago, it looked like hCalendar might be merge into it or be
replaced my similar systems, such as Schema.org's Event property [4].
However, now it looks like HTML5 time attribute might edge out Schema.org
and hCalendar. Unfortunately, it seems to be impossible to encode hCalendar
microformats as HTML5 microdata.

[3] http://microformats.org/wiki/hcalendar
[4] http://schema.org/Event
[5] http://html5doctor.com/the-time-element/

Ongoing events -- much of library event data doesn't fit neatly into
regular calendar systems. Whereas calendaring systems only seem to be good
at scheduling events with a specified time and date of occurence, I'd also
like to see calendar system that can handle scheduling of events that are
ongoing -- e.g., exhibits, art shows, library week announcements, etc.  A
defining feature of a good event system would the ability to schedule both
the publication and expiration dates of the event, along with a mechanism
to archive expired events. From the public's point of view, an ongoing
event would appear once on the calendar -- i.e., as a single event spanning
several days rather than as a series individual listings strung over the
course of several days or weeks. On a day calendar, it would show as an
all-day event or announcement. On a week or month calendar, it might be a
bar spanning the days or weeks for which it was in effect.

My observation has been that

Re: [CODE4LIB] Choosing fora. was: Proliferation of Code4Lib Channels

Reddit tends to be a pretty segmented place, there are many subreddits
that exist, IMO, as more or less 'culturally autonomous' from the rest
of the reddit, with little interaction with other parts of reddit. Just
people taking advantage of reddit to do their own thing.

Reddit's UI makes it easy for these subreddits to stay completely
separate, there's really little in the UI that brings people from one
area of reddit to another or makes them end up 'combined'.

I believe that there are many sub-communities on reddit that do not have
this misogyny problem, even if reddit's brand has sadly become known
for misogyny. I could be wrong, but I'd suggest finding out by asking
friends of yours that are redditors (or finding out if friends of yours
are redditors, heh), rather than assuming based on media reports that
anything on reddit is doomed. Mainstream media is not very good at
covering virtual communities, even still.

That said, I still don't think a Code4Lib subreddit is likely to become
a particularly useful idea, I think it's unlikely to ever achieve
'critical mass' (It has been tried before, there's both a code4lib and a
libraries subreddit that have existed for quite a while without
significant uptake, aren't there?)

On 12/2/2012 1:44 PM, Karen Coyle wrote:

*sigh* From an article about sexual harassment on reddit:

Reddit is a notoriously male-dominated forum. According to Google's
DoubleClick Ad Planner, Reddit users in the U.S.
https://www.google.com/adplanner/site_profile#siteDetails?uid=domain%253A%2520Reddit.comgeo=001lp=false
are 72 percent male. Reddit subgroups include r/mensrights and the
misogynistic r/chokeabitch, perhaps in part prompting another popular
thread that asked recently, Why is Reddit so anti-women?
http://www.reddit.com/r/AskReddit/comments/x5oac/why_is_reddit_so_antiwomen_outside_of_rgonewild/
In April, a confused 14-year-old user took to the site in a desperate
attempt to seek advice after she had been sexually assaulted
http://www.reddit.com/r/AskReddit/comments/smbgv/i_think_i_might_have_been_raped_on_420please_help/.
Jezebel chronicled the backlash, as commenters attacked the young victim
for overreacting
http://jezebel.com/5904323/reddit-is-officially-the-worst-possible-place-for-rape-victims-to-seek-advice.

Given its reputation, the site may seem less than appropriate as a forum
for effective dialogue.[1]

Which doesn't mean that we should boycott reddit, but it is good to know
the make-up and culture of tools that you use. And I think I have yet to
find a thread on ANY TOPIC on slashdot that doesn't have the word tits
in it somewhere. I just read the post about the possible move to a $1
coin in the US, and the first post is about strippers. FIRST POST.

*sigh* Although perhaps the question now is: which will happen first -
acceptance of a $1 coin in the US or a Slashdot thread that isn't sexist?

kc
[1] http://www.huffingtonpost.com/2012/07/30/reddit-rapists_n_1714854.html

On 11/30/12 9:51 AM, Shaun Ellis wrote:

Mark and Karen, yes, the DIY and take-initiative ethos of Code4Lib
leads to a lot of channels. I think this is a good thing as each has
its strengths. But it creates chaos without more clarity on what
platforms are best for certain types of communication?

We have similar issues when it comes to our own internal documentation
attempts at Princeton. Wiki? Git? Git Wiki? IRC? Blogosphere? Reddit?
Listserv? Twitter? Why should I use any of them?!?

I will say that I like Reddit for potentially controversial or
philosophical discussions. It's built to keep the conversation on
track and reward the most insightful/best comments with more visibility.

So, anyway, I've posted this discussion on the subreddit:
http://www.reddit.com/r/code4lib/comments/1426fn/the_diy_and_takeinitiative_ethos_of_code4lib/

I also added a post on mentorship to the subreddit, since I'm
particularly interested in that. Karen, while I think your comments
on promotion and giving credit are important, I'm not sure how
they are related to mentorship. Would love to hear more about that in
the subreddit.

-Shaun

On 11/30/12 12:30 PM, Mark A. Matienzo wrote:

On Fri, Nov 30, 2012 at 12:07 PM, Karen Coyle li...@kcoyle.net wrote:

Wow. We could not have gotten a better follow-up to our long thread
about
coders and non-coders.

I don't git. I've used it to read code, but never contributed. I even
downloaded a gui with a cute icon that is supposed to make it easy,
and it
still is going to take some learning.

So I'm afraid that it either needs to be on a different platform for
editing, OR someone (you know, the famed someone) is going to have
to do
updates for us non-gitters.

Karen, I've added instructions about how to add contributions without
knowing Git to the README file:
https://github.com/code4lib/antiharassment-policy/blob/master/README.md

If you'd like, I'm happy to have feedback as to changes here. A small
handful of people have also asked if we could move

Re: [CODE4LIB] Choosing fora. was: Proliferation of Code4Lib Channels


On 12/2/2012 9:19 PM, Esmé Cowles wrote:

I think this raises some interesting questions about community and
appropriate use of the code4lib name.  I just took a look at the
code4lib reddit and there were comments from a handful of people.  If
a handful of people want to create some new channel and call it
code4lib, is that OK?


It always has been up to now, it's how every single part of code4lib was 
created. So it's how we got here.




Who decides that?


That handful of people do.


Does it matter if it's part
of something like reddit, that is seriously at odds with our budding
anti-harassment policy?


I think it's far from clear that a code4lib subreddit is inherently at 
odds with an anti-harrasment policy (OR more importantly, at odds with 
our desire to be a comfortable place for all sorts of people including 
people from disadvantaged groups, which is more important than any 
particular policy).


But of course not everyone will agree on this, perhaps I am wrong.   I'd 
suggest that if you think someone is doing is something with the 
code4lib name you find harmful to code4lib, you bring it up with them, 
either in private or in public, whatever you prefer.


I think it's more productive to discuss this in concrete than in 
abstract. I don't think we need some general policy or beucrocracy on 
who can use the code4lib name, we've never had one before. But instead 
of that, what we have is the ability to discuss _any particular use_ 
that people don't like -- so if you don't like the group on reddit, 
let's talk about THAT, specifically.


If the general consensus seems to be that there shouldn't be a code4lib 
reddit area, then I suspect the people who created it will get rid of 
it. That's always happened before. If they don't, then the community can 
decide what we should do to distance that from code4lib (which we'd have 
to do anyway with non-compliant folks even if we had a policy and 
beurocracy over who was allowed to use the name).


So if this is not just hypothetical but you actually are concerned about 
it, please do bring it up in a separate thread on the list, or start by 
contacting the folks who created the reddit thing off-list, whatever you 
prefer.

Re: [CODE4LIB] Choosing fora. was: Proliferation of Code4Lib Channels

I don't think running one's own Hacker News OR Reddit is a particularly
sustainable thing to do.

I say as someone who's looked into both, for daydreams of improving the
planet.code4lib stuff. They're both fairly complicated codebases, with
multiple components that need to be installed, and not a lot of
documentation (as they are mainly developed for their patrons, they code
is made available open source, but is not really documented/supported
for other people).

Really, I don't think running virtually ANY software of our own for
'code4lib' is particularly sustainable, we're already having trouble
sufficiently maintaining what we've already got; this stuff ends up
being a lot more work than expected to maintain, and after the initial
novelty of implementing a new thing! wears off (if not before :) ),
difficult to find volunteer labor to maintain.

Especially without knowing if people are going to use the thing anyway.

If there's a free service that already does what you want, why not just
use it, and see if it catches on? Well, in this case because some people
are objecting to www.reddit.com as a service, I guess. Personally, I
think those objects are at least in part mis-placed, reddit is just a
big place where lots of stuff happens (like youtube, or the internet):
check out for instance http://www.reddit.com/r/feminism
http://www.reddit.com/r/transgender ). But maybe I'm wrong on this.

Either way though, I kind of suspect nobody would be using a /r/Code4Lib
anyway, honestly. On the other hand, maybe I'm wrong about that too, I
just went to look up the 'libraries' reddit some folks created a while
ago to show that it didn't get much use -- but found it actually IS
getting some use! http://www.reddit.com/r/libraries

On 12/3/2012 11:34 AM, Shaun Ellis wrote:

I'm not particularly sold on Reddit. I just think that there are some
types of discussions that might be more constructive with a threaded
forum than a listserv, just like there are some types of communication
that are more suited to IRC or the wiki. In line with Jonathan's
comments, we're not going to stop using YouTube just because it's filled
with trolls, right?

I only suggested and created the subreddit because it's easy to set up
and requires very little maintenance. I, for one, am open to
suggestions for tools with similar functionality, so long as they don't
require too much maintenance.

Looking at the Hacker News source code... anyone know Arc? :)

-Shaun

On 12/3/12 11:23 AM, Jonathan Rochkind wrote:

Reddit's UI makes it easy for these subreddits to stay completely
separate, there's really little in the UI that brings people from one
area of reddit to another or makes them end up 'combined'.

On 12/2/2012 1:44 PM, Karen Coyle wrote:

*sigh* From an article about sexual harassment on reddit:

are 72 percent male. Reddit subgroups include r/mensrights and the
misogynistic r/chokeabitch, perhaps in part prompting another popular
thread that asked recently, Why is Reddit so anti-women?
http://www.reddit.com/r/AskReddit/comments/x5oac/why_is_reddit_so_antiwomen_outside_of_rgonewild/

In April, a confused 14-year-old user took to the site in a desperate
attempt to seek advice after she had been sexually assaulted
http://www.reddit.com/r/AskReddit/comments/smbgv/i_think_i_might_have_been_raped_on_420please_help/.

Jezebel chronicled the backlash, as commenters attacked the young victim
for overreacting
http://jezebel.com/5904323/reddit-is-officially-the-worst-possible-place-for-rape-victims-to-seek-advice.

Given its reputation, the site may seem less than appropriate as a forum
for effective dialogue.[1]

Which doesn't mean that we should boycott reddit

Re: [CODE4LIB] Proliferation of Code4Lib Channels

2012-11-30 Thread Jonathan Rochkind

 A final note is that Reddit's source code is up on github.  I'm not a
 python expert, but it could probably be set up in isolation from reddit
 if that's seen as a problem.  It could use whatever authentication the
 C4L wiki uses.  I has a restful API as well, so we could integrate it
 into the listserv as Ed Summers did with the jobs site.

I believe you're talking about a fairly major development/maintenance project 
there.  
Installing and running the reddit software for myself is not something I think 
anyone
should plan on doing as a minimal part of their 'spare time', let along 
modifying it and
running a forked version.  

Nothing wrong with major development/maintenance projects done by volunteers
if someone's interested.  And nothing wrong with experimenting with it to see 
if you
can prove me wrong and it really is a trivial task. 

But I'd be cautious of assuming that code4lib has a bottomless reserve of 
volunteer
labor to do non-trivial tasks, we have trouble continuing to maintain the tech
infrastructure we've already got.  If it were me, I'd be considering 
cost/benefit, and
not assuming something will be used just because if you build it they will 
come. 

And if someone IS looking to do some self-directed development and maintenance
work for the code4lib community, they should of course do it where they feel 
most
called to do it -- but if you have an interest in helping out the Code4Lib 
Journal, we
could use it, we're having trouble maintaining and developing our tech 
infrastructure
there at the level we'd like, with currently available interested volunteer 
labor.

Re: [CODE4LIB] What is a coder?

 The mission statement on the code4lib website says The Code4Lib Journal
 exists to foster community and share information among those interested


I want to clarify that the Code4Lib Journal is a specific project with a 
specific list of people on it's editorial board. In this way, it's unlike the 
broader Code4Lib Community of which it's a part, which really is a community 
in the ordinary sense of the word, not a formal organization or project. 

The Journal only speaks for the Journal, not for Code4Lib.  That mission 
statement is on the Journal website, and is the Journal's mission, as agreed 
upon by the Journal's founding editorial board; it is not the code4lib 
website, the mission statement was agreed upon by nobody other than the 
Journal's founding editorial board, and it applies to nothing other than the 
Journal. 

(But I don't think I've ever heard ANYONE say that only coders are welcome at 
code4lib, I think it's a straw man and I'm not sure why it's being 'debated'.  
I just wanted to clear up the relationship between The Code4Lib Journal and 
it's website to code4lib.  Perhaps the Journal website needs some more 
clarifying language on it's website? I think it probably does, hmm.)

Re: [CODE4LIB] What is a coder?

Dude, I'm positive I'm a coder because I spend a whole lot of time coding, and 
I think I do it pretty decently -- and search in Google is a key part of my 
workflow!   So is debugging.   Hopefully 
copy-and-paste-coding-without-knowing-what-i'm-doing is not, however, true. 

But no need to be elitist about it. 

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Friscia, 
Michael [michael.fris...@yale.edu]
Sent: Thursday, November 29, 2012 8:45 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] What is a coder?

Thought process of a coder:
1- I need to open a file in my program
2- ok, I'll import IO into my application and read the definition
3- i create methods and functions around the definition and open my file
Total time to deliver code: 5 mins

Thought process of a non-coder
1- I need to open a file in my program
2- I open up a web browser and go to google
3- search open file in java
4- copy/paste the code I find
5- can't figure out why it doesn't work, go back to step 3 and try a different 
person's code
6- really stuck, contemplates changing the programming language
7- runs some searches on easier programming languages
8- goes back to Google and tries new search terms and gets different results
9- finally get it working
10- remove all comments from the copy/paste code so it looks like I wrote it.
Total time to deliver code: 5 hours


___
Michael Friscia
Manager, Digital Library  Programming Services

Yale University Library
(203) 432-1856


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mark A. 
Matienzo
Sent: Wednesday, November 28, 2012 10:03 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] What is a coder?

Some discussion (both on-list and otherwise) has referred to coders,
and some discussion as such has raised the question whether
non-coders are welcome at code4lib.

What's a coder? I'm not trying to be difficult - I want to make
code4lib as inclusive as possible.

Mark A. Matienzo m...@matienzo.org
Digital Archivist, Manuscripts and Archives, Yale University Library
Technical Architect, ArchivesSpace

Re: [CODE4LIB] What is a coder?

The statement on the actual code4lib website (not the Journal's website) can 
be found here:

http://code4lib.org/about

I have no idea how old that statement is, or how often it's been changed -- it 
looks like it's got some stuff added to it at least as a result of recent 
discussion?  But at any rate, it probably wasn't consensed upon by any large 
group of people, it's probably somebody at some point thought made sense and 
put there, and it's stayed there because nobody found it objectionable 
(possibly because nobody noticed it). 

I don't think there's anything wrong with that, I think that's how our 
community works!  But it means it's not set in stone or anything, or 
representative of 'everybody', or representative of everyone's thinking.  
Particular projects done by code4lib people have particular missions and goals 
and organizational structures -- code4lib in general has none of these 
things, it's just a bunch of people, nothing more or less.  (With regard to 
that 'about' statement particularly, if you want to change the 'about' there, 
draw up a draft, get feedback from others on it, install it when general 
consensus seems to be reached. It sounds like some people may have been doing 
that recently, although perhaps they skipped the tell folks you're changing it 
and get feedback step. :) )

But anyway, here's the 'about' statement on the actual code4lib website. 
(Personally, I would not refer to code4lib as a collective, as 'collective' 
to me means more of a cohesive organization with defined membership; I'd call 
it a 'community'). 


code4lib isn't entirely about code or libraries. It is a volunteer-driven 
collective of hackers, designers, architects, curators, catalogers, artists and 
instigators from around the world, who largely work for and with libraries, 
archives and museums on technology stuff. It started in the fall of 2003 as a 
mailing list when a group of library programmers decided to create an 
overarching community agnostic towards any particular language or technology.

Code4Lib is dedicated to providing a harassment-free community experience for 
everyone regardless of gender, sexual orientation, disability, physical 
appearance, body size, race, or religion. For more information, please see our 
emerging CodeofConduct4Lib.

code4lib grew out of other efforts such as the Access Conference, web4lib, 
perl4lib, /usr/lib/info (2003-2005, see archive.org) and oss4lib which allow 
technology folks in libraries, archives and museums to informally share 
approaches, techniques, and code across institutional and project divides. Soon 
after the mailing list was created, the community decided to setup a #code4lib 
IRC channel (chat room) on freenode. The first face-to-face meeting was held in 
2005 in Chicago, Illinois, USA and the now-annual conference started in 2006 in 
Corvallis, Oregon, USA, and has continued since. Local meetings have also 
sprung up from time to time and are encouraged. A volunteer effort manages an 
edited online journal that publishes relevant articles from the field in a 
timely fashion.

Things get done because people share ideas, step up to lead, and work together, 
not because anyone is in charge. We prefer to make community decisions by 
holding open votes, e.g. on who gets to present at our conferences, where to 
host them, etc. If you've got an idea or an itch to scratch, please join in; we 
welcome your participation!

If you are interested in joining the community: sign up to the discussion list; 
join the Facebook or LinkedIn groups; follow us on Twitter; subscribe to our 
blogs; or get right to the heart of it in the chat room on IRC.



From: Jonathan Rochkind
Sent: Thursday, November 29, 2012 9:02 AM
To: Code for Libraries
Subject: RE: [CODE4LIB] What is a coder?

 The mission statement on the code4lib website says The Code4Lib Journal
 exists to foster community and share information among those interested


I want to clarify that the Code4Lib Journal is a specific project with a 
specific list of people on it's editorial board. In this way, it's unlike the 
broader Code4Lib Community of which it's a part, which really is a community 
in the ordinary sense of the word, not a formal organization or project.

The Journal only speaks for the Journal, not for Code4Lib.  That mission 
statement is on the Journal website, and is the Journal's mission, as agreed 
upon by the Journal's founding editorial board; it is not the code4lib 
website, the mission statement was agreed upon by nobody other than the 
Journal's founding editorial board, and it applies to nothing other than the 
Journal.

(But I don't think I've ever heard ANYONE say that only coders are welcome at 
code4lib, I think it's a straw man and I'm not sure why it's being 'debated'.  
I just wanted to clear up the relationship between The Code4Lib Journal and 
it's website to code4lib.  Perhaps the Journal website needs some more

Re: [CODE4LIB] What is a coder?

I think that _everyone_ who finds our topics and discussions interesting and 
useful is welcome at the conference, on the listserv, in IRC, etc. 

However, at the same time, I will confess that I personally find the 
proliferation of archival/repository topics at the conference dissapointing.  I 
feel like there are many many venues for discussing institutional 
repositories and digital archiving.  Many other venues (journals, conferences, 
listservs, organizations) that purport to be about library technology in 
general or digital libraries really end up being focused almost exclusively 
on archival/repository matters.  When I first found code4lib, what was exciting 
to me is that finally there was a venue for people discussing and trying to DO 
technological innovation in actual 'ordinary' library user services, in helping 
our patrons do all the things that libraries have traditionally tried to help 
them do, and which need an upgraded tech infrastructure to continue helping 
them do in the 21st century.  

But that's just me.  I don't think there's _anyone_ that's interested in 
drawing lines around _who_ can participate in 'code4lib'. 

But I think almost _everyone_ has an interest in _what_ the topics and 
discussions at code4lib are.  Because that's what makes it code4lib, there's 
already a web4lib listserv, there's already a D-Lib Magazine, there's already 
DLF gatherings, there's already LITA, etc -- those who are fans of code4lib 
like it because of something unique about it, and want it to change in some 
ways and not in other ways. And we probably don't all agree on those ways. But 
it would be disingenous to pretend that everyone in code4lib has no opinion 
about what sorts of topics and discussions should take place at confs or on the 
listserv etc. 

But I've still never seen anyone say that any person or type of person is 
unwelcome!  Yeah, there is some tension here, becuase of course what ends up 
creating the what, but the who who are there?

I am not afraid to say that code4lib would not be able to remain code4lib 
unless the _majority_ of participants were coders, broadly understood 
(writing HTML is writing code, writing anything to be interpreted by a computer 
is writing code).  But either that will happen or it won't, there's no way to 
force it. 

(And personally, I'm not afraid to say that code4lib would not be able to 
remian code4lib for ME, if the _majority_ of participants become people who 
work mostly on digital repository or archival areas, as is true of so many 
other library technology venues.) 

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Christie 
Peterson [cpeter...@jhu.edu]
Sent: Thursday, November 29, 2012 9:13 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] What is a coder?

I think my tweet yesterday may have been partially responsible for raising this 
question in Mark's mind. I wrote: Debating registering for c4l since I'll be 
getting -- at most -- 50% reimbursement for costs , well, I'm not a coder. 
Thoughts? When I wrote this, I was using coder in the sense that Jonathan 
used it: A coder is someone who writes code, naturally. :) and also in the 
sense that Henry mentioned: sysadmin types who do a minimal amount of literal 
coding but self-identify as technologists.

I profess to be neither, yet many of the topics on this year's lineup are 
directly relevant to my work. My professional identity is, first, as an 
archivist. This belies a lot of tech-heavy activities that I'm involved with, 
however: management of born-digital materials, digital preservation, 
designing/building a digital repository, metadata management, interface design, 
process improvement and probably a few other things that just don't happen to 
be what I'm thinking about at this particular moment.

So although I'm not a coder in the sense that I defined above, it's essential 
for my work that I understand a lot about the technical work of libraries and 
that I can communicate and collaborate with the true coders. As my tweet 
hinted at, this puts me in an odd place in terms of library financial support 
for attendance at technology-focused conferences. While the coders I work 
with (hi guys!) get fully funded to attend code4lib and similar conferences, I 
don't.

If this were training in the sense of a seminar or a formal class on the 
exact same topics, I would be eligible for full funding, but since it's a 
conference, it's funded at a significantly lower level. I'll gladly take 
suggestions anyone has for arguments about why attendance at these types of 
events is critical to successfully doing my work in a way that, say, attending 
ALA isn't -- and why, therefore, they should be supported at a higher funding 
rate than typical library conferences. Any non-coders successfully made this 
argument before?

Cheers,

Christie S. Peterson
Records Management Archivist
Johns Hopkins University
The Sheridan Libraries

Re: [CODE4LIB] tech vs. nursing


On 11/29/2012 4:19 PM, Chris Fitzpatrick wrote:

  departments in kinda interesting ways. There now seems to be things like
Metadata or Systems groups that are distinct from Digital Repository
or Applications groups. Catalogers and the people who work on the ILS are
often completely segregated from the people who work on the new flashy
grant-funded projects.


Yes, this isn't new, and it is a problem.

The former, it kinda seems to me, tends to have more

women members while the latter is often lacking. Code4Lib draws mostly from
people working in these new-ish groups,


Code4Lib didn't used to, when I attended the second code4lib conf, the 
vast majority of the presentations and presenters were NOT about 
grant-funded work or digital repository work, and the majority of 
people I met at Code4Lib were not working on such things.


I miss that. Code4Lib was in fact the only place I knew of for people 
working on traditional library use cases, not on grant-funded projects, 
trying to innovate with technology and keep libraries relevant.

Re: [CODE4LIB] What about Code4Lib4Women?

2012-11-28 Thread Jonathan Rochkind

Sounds possibly interesting. Other than a word, what would that be 
exactly, and what would be the goals of it?  Do you mean a different 
conference, or listserv, or what?


On 11/28/2012 3:34 PM, Salazar, Christina wrote:

And/or Code4Lib4[I hate that word minority, but cannot think of another for 
here, but maybe you get what I mean]

Not trying to splinter, but that might be one way to encourage diversity but 
again, without implication that ANYONE would be excluded.

(Inspired by http://www.meetup.com/Los-Angeles-Womens-Ruby-on-Rails-Group/ )

Christina Salazar
Systems Librarian
John Spoor Broome Library
California State University, Channel Islands
805/437-3198
[Description: Description: CI Formal Logo_1B grad_em signature]

Re: [CODE4LIB] What is a coder?

2012-11-28 Thread Jonathan Rochkind

A coder is someone who writes code, naturally. :)  code is something intended 
to be interpreted or executed by a computer or a computer program. 

I think everyone agrees that anyone is welcome at code4lib. 

However, many want to keep code4lib conference presentations and community 
focused on technical matters and matters of interest to coders. 

These things are not neccesarily contradictorily.  

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Mark A. 
Matienzo [mark.matie...@gmail.com]
Sent: Wednesday, November 28, 2012 10:02 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] What is a coder?

Some discussion (both on-list and otherwise) has referred to coders,
and some discussion as such has raised the question whether
non-coders are welcome at code4lib.

What's a coder? I'm not trying to be difficult - I want to make
code4lib as inclusive as possible.

Mark A. Matienzo m...@matienzo.org
Digital Archivist, Manuscripts and Archives, Yale University Library
Technical Architect, ArchivesSpace

[CODE4LIB]

2012-11-27 Thread Jonathan Rochkind


On 11/27/2012 4:46 PM, Shaun Ellis wrote:

I agree with Tom.  If you look at the links Andromeda sent earlier in
this thread, both conference organizers reported dramatic increases in
the number of under-represented presenters simply by 1) making the
proposal authors anonymous during voting


Hmm, is the proposal author a legitimate (or illegitimate) criteria to 
judge proposals on?  I tend to think it's actually legitimate; there are 
some people I know will give a valuable presentation because of who they 
are, and others who's expertise I might trust on some topics but not 
others.


I don't think this is illegitimate, and wouldn't want to take this 
information away from voters. We are, after all, voting not just on a 
topic, but on a topic to be presented by a certain person or people.


(I would be quite fine with having some of the program decided upon by 
the program committee not by the voters at large though! Using a variety 
of criteria.  In addition to issues of diversity in presenters, I think 
it could also in general improve the quality of presentations and 
topical diversity as well).

[CODE4LIB] Your proposal wasn't accepted? Consider submitting to the Code4Lib Journal?

2012-11-26 Thread Jonathan Rochkind


Are you sad your proposal wasn't accepted to Code4Lib Conference?

Please consider submitting it as an article to Code4Lib Journal instead!

In fact, you can submit something as an article even if you are 
presenting at the conf too -- but especially if you aren't, getting an 
article published in the Journal can be an alternate way to get your 
ideas out to the Code4Lib audience -- maybe get them out to even more 
people than would see them at the conference, and in something that 
stays on the web for future generations of system librarians too!


It isn't neccesarily that much harder to prepare an article than to 
prepare a presentation.  Whatever you would have included in your 
presentation, you just need to set it down in narrative text instead. It 
needs to be clear and legible, and we won't just take a slide deck as an 
article -- but we do accept articles written informally, if they are 
clear and convey good information, and articles can include screenshot 
and screencast components.


Share what you're doing with your peers, at the Code4Lib Journal!  There 
are a number of proposed presentations that didn't make the cut, that I 
would have liked learning about -- I hope you submit them as articles to 
the Journal instead!


We accept submissions at any time, on a rolling basis, as either 
abstracts or first drafts:


http://journal.code4lib.org/call-for-submissions


Although the next proposal cut-off date is Jan 7th, for the 20th issue 
to be published in the spring. But don't procastinate and wait, avoid 
the rush, get your proposal in now while it's fresh in your head!

Re: [CODE4LIB] COinS

2012-11-21 Thread Jonathan Rochkind


On 11/20/2012 8:25 PM, Godmar Back wrote:

Could you elaborate on your belief that COinS is actually illegal in
HTML5? Why would that be so?


Yeah, thanks for calling me on that, I was wrong! Not sure where I got 
that idea, but it does not seem to be illegal. (Did some earlier version 
of HTML5 get rid of 'title attribute on every element'? Or was I just 
confused?)


Perhaps what I was thinking of is that some people see an accessibility 
issue in using the 'title' attribute for non-human-readable data, like 
COinS does. As the title attribute theoretically provides extra 
human-readable content that a user-agent can display in some cases, and 
filling it with non-human-readable data may confuse people. I seem to 
recall _someone_ complaining about a COinS title attribute on these 
grounds in some app I develop, but I can't remember the details.


Here's others mentioning that potential problem:
* http://en.wikipedia.org/wiki/Microformat#Accessibility
http://www.bbc.co.uk/blogs/radiolabs/2008/06/removing_microformats_from_bbc.shtml

However, in practice, that seems to be a problem more likely, if at all, 
with title attributes on abbr elements, not span elements like 
COinS. If you google around, you find a lot of people complaining about 
the reverse problem -- don't assume that adding a title attribute to 
your span provides an accessible description (say, to visually 
impaired users), because most assistive user-agents in fact ignore the 
title attribute!


Still, it's kind of messy to use a title attribute for 
non-human-readable purposes. And is a large part of the motivation for 
HTML5 microdata.






  - Godmar



On Tue, Nov 20, 2012 at 5:20 PM, Jonathan Rochkind rochk...@jhu.edu wrote:


It _IS_ an old unused metadata format that should be replaced by something
else (among other reasons because it's actually illegal in HTML5), but I'm
not sure there is a something else with the right balance of flexibility,
simplicity, and actual adoption by consuming software.

But COinS didn't have a whole lot of adoption by consuming software
either. Can you say what you think the COinS you've been adding are useful
for, what they are getting used for? And what sorts of 'citations' youw ere
adding them for? For my own curiosity, and because it might help answer if
there's another solution that would still meet those needs.

But if you want to keep using COinS -- creating a COinS generator like
OCLC's no longer existing one is a pretty easy thing to do, perhaps some
code4libber reading this will be persuaded to find the time to create one
for you and others. If you have a server that could host it, you could
offer that. :)




On 11/20/2012 4:47 PM, Bigwood, David wrote:


I've used the COinS Generator at OCLC for years. Now it is gone. Any
suggestions on how I can get an occasional COinS for use in our
bibliography? Do any of the citation managers generate COinS?



Or is this just an old unused metadata format that should be replaced by
something else?



Thanks,

Dave Bigwood

dbigw...@hou.usra.edu

Lunar and Planetary Institute

Re: [CODE4LIB] COinS

2012-11-20 Thread Jonathan Rochkind

It _IS_ an old unused metadata format that should be replaced by 
something else (among other reasons because it's actually illegal in 
HTML5), but I'm not sure there is a something else with the right 
balance of flexibility, simplicity, and actual adoption by consuming 
software.


But COinS didn't have a whole lot of adoption by consuming software 
either. Can you say what you think the COinS you've been adding are 
useful for, what they are getting used for? And what sorts of 
'citations' youw ere adding them for? For my own curiosity, and because 
it might help answer if there's another solution that would still meet 
those needs.


But if you want to keep using COinS -- creating a COinS generator like 
OCLC's no longer existing one is a pretty easy thing to do, perhaps some 
code4libber reading this will be persuaded to find the time to create 
one for you and others. If you have a server that could host it, you 
could offer that. :)




On 11/20/2012 4:47 PM, Bigwood, David wrote:

I've used the COinS Generator at OCLC for years. Now it is gone. Any
suggestions on how I can get an occasional COinS for use in our
bibliography? Do any of the citation managers generate COinS?



Or is this just an old unused metadata format that should be replaced by
something else?



Thanks,

Dave Bigwood

dbigw...@hou.usra.edu

Lunar and Planetary Institute

[CODE4LIB] ruby gem for testing IP addresses for inclusion in sets of non-contiguous address ranges

2012-11-08 Thread Jonathan Rochkind

Something we university library folks often need to do, even though it's 
kind of a ridiculous design.


I wrote a ruby convenience gem for it that some may find useful, 
basically just a convenience method around the ruby IPAddr stdlib, which 
does the heavy lifting.


https://github.com/jrochkind/ipaddr_range_set

Re: [CODE4LIB] Code4Lib Mid-Atlantic Google Group

2012-11-08 Thread Jonathan Rochkind

All it takes is doing it. You can create a wiki page on the code4lib 
wiki if you want, next to the other regional ones. The wiki is editable 
by anyone.


Then you just have to find other people who live around you, and get 
them to do code4lib-like activities with you using the code4lib name.


That's all there is.

On 11/8/2012 3:12 PM, Akerman, Laura wrote:

Another newbie (can't say how innocent) is interested in the answer to this 
 and seeing that there isn't one for the Southeast, wonders what it would take 
to create one?

Or, if that's out of reach for now, whether a visitor from below the 
Mason-Dixon line would be unwelcome or not to one of the other regions' 
meetings?

We do do code down here sometimes...

Laura

Laura Akerman
Technology and Metadata Librarian
Room 128, Robert W. Woodruff Library
Emory University, Atlanta, Ga. 30322
(404) 727-6888
lib...@emory.edu

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Michael 
Schofield
Sent: Thursday, November 08, 2012 8:45 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Code4Lib Mid-Atlantic Google Group

Hi David [and all],

Innocent newbie question: I see there is a Code4Lib NE and Mid-Atlantic - does 
the latter descend so far into Florida? Is there a Code4Lib SE?

A better question: is there a more appropriate place for me to have looked this 
up?

Michael

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mark 
Wilhelm
Sent: Thursday, November 08, 2012 6:09 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Code4Lib Mid-Atlantic Google Group

David,

When I access this group I get a you cannot view topics in this forum
message.

Thanks once again for hosting the conference a few weeks back.

--Mark

On Wed, Oct 24, 2012 at 11:44 AM, David Uspal david.us...@villanova.edu
wrote:

All,

 Thanks to everyone who made the Code4Lib Mid-Atlantic kick-off
meeting

a success!  To keep the ball rolling, I've set up a temporary home base at 
Google Groups so we can talk about local issues (our next informal meetup, 
listservs, etc) without flooding inboxes.  You can join the growing list
here:


https://groups.google.com/forum/#!forum/code4lib-mid-atlantic


David K. Uspal
Technology Development Specialist
Falvey Memorial Library
Phone: 610-519-8954
Email: david.us...@villanova.edu


--
Mark Wilhelm
E-Mail: markc...@gmail.com
Twitter: @markcwil
Facebook: facebook.com/markcwil
Read the Information Science News Blog at:
http://infoscinews.blogspot.com/



This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).

Re: [CODE4LIB] A [Wordpress-based] Alerts Dashboard - Library Closings, etc.

2012-11-07 Thread Jonathan Rochkind

That's a really cool idea Jason! I highly encourage you to write it up 
for the Code4Lib Journal, sounds like a great (possibly short) article 
for the journal.


Do you do anything with dates, so 'old' alerts/notices aren't shown 
anymore?  Sounds like no, you just display the last 3, in case people 
want to look back at history too?


Would love to see some screenshots or webcasts or examples of it in 
action -- or write a code4lib journal article to share with everyone!


On 11/7/2012 11:31 AM, Jason Griffey wrote:

We aren't right now...all posts just go where they go. But it's
trivial to break out a category-specific RSS feed in Wordpress, so
that would be easily done.

We typically update the notice instead of taking it down. Good blog
form, and all that. For most alert items (Database down, etc) the
display just shows the last 3-5 items, and so stuff rolls off quickly.
If not, the update generally takes care of it.

Jason


On Wed, Nov 7, 2012 at 9:37 AM, Michael Schofield mschofi...@nova.edu wrote:

Hey Jason,

Are you watching for different categories--closings, emergencies, weather - 
etc.--and, also, how are you determining when to take down the notice (if at 
all)?

Sent from my iPhone

On Nov 7, 2012, at 10:26 AM, Jason Griffey grif...@gmail.com wrote:


We run a Wordpress multisite setup here at MPOW, and have two
different blogs that we use for this type of purpose: an Alerts blog
for in-house alert needs, and a News blog for public-facing
announcements. We just use the RSS feed to push the alerts where
needed, and there's certainly no shortage of RSS collection/parsing
libraries. I'm partial to Magpie (http://magpierss.sourceforge.net/)
but only because I've had years of using it.

We even recently moved to using Growl for Windows with an RSS plugin
to do heads up alerts on staff/faculty PCs, so that when something
is posted to the Alerts blog, all staff machines get an
impossible-to-ignore alert overlay on their screens. We will likely be
doing a similar thing for Emergency use and the public machines.

Jason



On Wed, Nov 7, 2012 at 9:12 AM, Michael Schofield mschofi...@nova.edu wrote:

Hey everyone,



I've been toying with the idea making something because I can't seem to find
a free alternative, but I thought I'd do my due diligence and pick your
brains. I'm open for any alternatives to the following, but I'm specifically
looking for a free option with an API.



Scenario: our main website lives on the university's server, which turns out
to be a very dull playground: HTML/CSS/JS only. This means there's about 150
static files that I'm now presently rolling into a WP Network living on our
own boxes-and our own domain-(we've been waiting for the last year for a
university-wide CMS, but we just don't want to hold our breaths any longer
J) but the main site, the landing page, will always be static. This means
that whenever there's an early closure, a hurricane watch, or some other
announcement someone has to submit a ticket and then I have to make a
change. My goal is to cut me-the middleman-out of the process.



My potential project: So what I was thinking was jury-rigging a Wordpress
theme into an alerts dashboard for managers, directors, and so on. I want
to empower the Circulation manager to login, make an announcement, and be
done with it. For all the departmental and other sites that live on the WP
Network, I'd write and install a corresponding alerts plugin that watches
the JSON API for an alert and-if true-display it. For our static sites, I'd
toss in a jquery plugin that did the same.



My question: this seems like something that's been done before! Has it? If
not, anyone want to collaborate on github?



All the best,



Michael Schofield(@nova.edu) | Web Services Librarian | (954) 262-4536

Alvin Sherman Library, Research, and Information Technology Center



Hi! Hit me up any time, but I'd really appreciate it if you report broken
links, bugs, your meeting minutes, or request an awesome web app over on the
Library Web Services http://staff.library.nova.edu/pm  site.

Re: [CODE4LIB] one tool and/or resource that you recommend to newbie coders in a library?

2012-11-01 Thread Jonathan Rochkind


http://journal.code4lib.org

On 11/1/2012 4:24 PM, Bohyun Kim wrote:

Hi all code4lib-bers,

As coders and coding librarians, what is ONE tool and/or resource that you 
recommend to newbie coders in a library (and why)?  I promise I will create and 
circulate the list and make it into a Code4Lib wiki page for collective wisdom. 
 =)

Thanks in advance!
Bohyun

---
Bohyun Kim, MA, MSLIS
Digital Access Librarian
bohyun@fiu.edu
305-348-1471
Medical Library, College of Medicine
Florida International University
http://medlib.fiu.edu
http://medlib.fiu.edu/m (Mobile)

[CODE4LIB] Q: Discovery products and authentication (esp Summon)


Looking at the major 'discovery' products, Summon, Primo, EDS

...all three will provide some results to un-authenticated users (the 
general public), but have some portions of the corpus that are 
restricted and won't show up in your results unless you have an 
authenticated user affiliated with customer's organization.


So when we look around on the web for Summon and Primo examples, we can 
for instance do some sample searches there even without logging in or 
being affiliated with the particular institution.


But we are only seeing a subset of results there, not actually seeing 
everything, since we didn't auth.


But most of these examples I look at don't, in their UI, make this 
particularly clear.


This leads to me wonder if, in actual use, even for customers who 
_could_ login to see complete results -- anyone ever does.


So very curious to get an answer from any existing customers as to this 
issue. Do the end-users realize they will get more complete results if 
they log in?   Do you have any numbers (or other info, even if not cold 
stats) on how many end-users choose to log in to see more complete results?


If nobody ever authenticates to see more complete results then the 
subset available to un-authenticated users essentially _is_ the product, 
the extra stuff that nobody ever sees is kinda irrelevant, no?


Anyone who is a current customer of Summon/Primo/EDS want to say 
anything on this topic? Would be helpful.

Re: [CODE4LIB] Q: Discovery products and authentication (esp Summon)


Right, thanks, but you're missing my point/question.

A significant portion of all of our libraries use these days is by 
patrons that are off-campus and will not be IP-authenticated (Unless you 
have all patrons use a VPN or something before using library services?)


Those off-campus patrons at Dartmouth, do they just always get the 
limited results available to non-auth end-users, or do you encourage 
them to login (and if so, any idea how many do?)


On 10/24/2012 1:54 PM, Mark Mounts wrote:

We have Summon at Dartmouth College. Authentication is IP based so
with a Dartmouth IP address the user will see all our licensed
content.

There is also the option to see all the content Summon has beyond
what we license by selecting the option Add results beyond your
library's collection

Mark

-Original Message- From: Code for Libraries
[mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind
Sent: Wednesday, October 24, 2012 12:16 PM To:
CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Q: Discovery products
and authentication (esp Summon)

Looking at the major 'discovery' products, Summon, Primo, EDS

...all three will provide some results to un-authenticated users (the
general public), but have some portions of the corpus that are
restricted and won't show up in your results unless you have an
authenticated user affiliated with customer's organization.

So when we look around on the web for Summon and Primo examples, we
can for instance do some sample searches there even without logging
in or being affiliated with the particular institution.

But we are only seeing a subset of results there, not actually seeing
everything, since we didn't auth.

But most of these examples I look at don't, in their UI, make this
particularly clear.

This leads to me wonder if, in actual use, even for customers who
_could_ login to see complete results -- anyone ever does.

So very curious to get an answer from any existing customers as to
this issue. Do the end-users realize they will get more complete
results if they log in?   Do you have any numbers (or other info,
even if not cold stats) on how many end-users choose to log in to see
more complete results?

If nobody ever authenticates to see more complete results then
the subset available to un-authenticated users essentially _is_ the
product, the extra stuff that nobody ever sees is kinda irrelevant,
no?

Anyone who is a current customer of Summon/Primo/EDS want to say
anything on this topic? Would be helpful.

Re: [CODE4LIB] Q: Discovery products and authentication (esp Summon)


On 10/24/2012 2:04 PM, Ben Florin wrote:

We use Primo, but we've never bothered with their restricted search scopes.


Apparently the answer to my question is that nobody has thought about 
this before, heh.


Primo, by default, will suppress some content from end-users unless they 
are authenticated, no?  Maybe that's what restricted search scopes 
are? I'm not talking about your locally indexed content, but about the 
PrimoCentral index of scholarly articles.


At least I know the Primo API requires you to tell it if end-users are 
authenticated or not, and suppresses some results if they are not. I 
assume Primo 'default' interface must have the same restrictions?


Perhaps the answer to my question is that at most discovery customers, 
off-campus users always get the 'restricted' search results, have no 
real way to authenticate, and nobody's noticed yet!

Re: [CODE4LIB] Q: Discovery products and authentication (esp Summon)

Good to have some numbers, thanks!  Even taking your largest number, 25% + 12% 
== 37% coming from on-campus is definitely less than half, and not 'most' use 
being from on-campus -- which does not surprise me at all, it's what I would 
expect. 

This is an interesting discussion, I think. Thanks all. (Except for Ross and 
that other guy having a flamewar about things entirely unrelated to the topic! 
Just kidding, we love you Ross and that other guy. But yeah, unrelated to the 
topic.) 

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of David Friggens 
[frigg...@waikato.ac.nz]
Sent: Wednesday, October 24, 2012 9:15 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Q: Discovery products and authentication (esp Summon)

 a) most queries come from on-campus
 Really? Are people just assuming this, or do they actually have data? That
 would surprise me for most contemporary american places of higher education.

For the last two months, 25.4% of our Summon traffic has come from the
IP addresses we've given as on campus, according to the stats
Serials Solutions provides. Note that another 11.8% came from the
local ISP that provides wireless for our students, so most of that
would be on campus at other institutions.

 But it may very well be the extra restricted content is not important and
 nobody minds it's absence. (Which would make one wonder why the vendor
 bothers to spend resources putting it in there!).

That's been our view (though you're making me think we should perhaps
try and understand better what the difference is).

The AI results are interesting. EDS seems to promote results from
their own AI databases more highly than I would expect, and they're
certainly noticeable when blanked out with cannot be displayed to
guests.

When Summon started showing AI results there was some interesting
discussion on the mailing list - they're not immediately accessible,
so they're arguably not in the library's collection. And Summon (as
does Primo) has an option to add results beyond your library's
collection. There was some argument on the other side, that AI
results are important to be included, so it seems that there is
librarian pressure as well as commercial/licence pressure.

David

Re: [CODE4LIB] VPN EZ Proxy

2012-10-18 Thread Jonathan Rochkind

VPN does what EZProxy does already -- make web access appear to come 
from an on-campus address -- but for ALL web access, not just access 
that follows links from your web pages using EZProxy. This assumes 
outgoing traffic from users using the VPN will be on an IP address 
recognized as 'licensed' by your vendors, which it typically is.


If people are using VPN's and happy with it, that's actually a MORE 
reliable solution than EZProxy. Rather than tell them to turn off your 
VPN, I would ensure you have EZProxy configured not to interfere with 
it, if possible.  If they are on a VPN, and then ALSO go through 
EZProxy... it should _work_, but it'll hurt performance, as all traffic 
is effectively being sent through two proxies, for no reason. You should 
instead configure your EZProxy so when a client is on an IP Address 
recognized as the VPN, EZProxy simply redirects without proxying instead 
of double-proxying the traffic.


On 10/18/2012 1:46 PM, Joselito Dela Cruz wrote:

Hi All,

We use EZ Proxy for authentication and we always tell the staff who uses
VPN to turn their VPN off so they can access our databases.
Is this the right way? Looking for answers around and could not find
any. I thought I would throw this in here.
Thanks for feedbacks.

Jay Dela Cruz

Re: [CODE4LIB] Q.: software for vendor title list processing

2012-10-17 Thread Jonathan Rochkind

I've always been a fan of ONIX for SOH, although never had the chance to 
use it -- but the spec is written nicely, based on my experience with 
this stuff, it actually accomplishes the goal of machine-readable 
statement of serial holdings (theoretically useful for print or online 
holdings) well.


KBART, I have some concerns about, when it comes to holdings. Is there a 
place to send feedback to KBART?  Just on a quick skim of the parts of 
interest to me, I am filled with alarm at how much missing the point 
this is:we recommend that the ISO 8601 date syntax should be 
used...  For simplicity, '365D' will always be equivalent to one year, 
and '30D' will always be equivalent to one month, even in leap years and 
months that do not have 30 days.


Totally missing the point of ISO 8601 to allow/encourage this when 1Y 
and 1M are available -- dealing with calendar dates is harder than one 
might naively think, and by trying to 'improve' on ISO 8601 like this, 
you just create a mess of ambiguous and difficult to deal with data.


On 10/17/2012 5:11 AM, Owen Stephens wrote:

Are there any examples of data in this format in the wild we can look at?

Also given KBART and ONIX for Serials Online Holdings have NISO involvement, is 
there any view on how these two activities complement each other?

Thanks,

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 17 Oct 2012, at 09:47, Michael Hopwood mich...@editeur.org wrote:


Hi Godmar,

There is also ONIX for Serials Online Holdings 
(http://www.editeur.org/120/ONIX-SOH/). I'm copying in Tim Devenport who might 
say more.

Best wishes,

Michael

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Owen 
Stephens
Sent: 16 October 2012 23:09
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Q.: software for vendor title list processing

I'm working on the JISC KB+ project that Tom mentioned.

As part of the project we've been collating journal title lists from various 
sources. We've been working with members of the KBART steering group and have 
used KBART where possible, although we've been collecting data not covered by 
KBART.

All the data we have at this level is published under a CC0 licence at 
http://www.kbplus.ac.uk/kbplus/publicExport - including a csv that uses the 
KBART data elements. The focus so far has been on packages negotiated by JISC 
in the UK - although in many cases the title lists may be the same as are made 
available in other markets. We also include what we call 'Master lists' which 
are an attempt to capture the complete list of titles and coverage offered by a 
content provider. We'd very much welcome any feedback on these exports, and of 
course be interested to know if anyone makes use of them.

So far a lot of the work on collating/coverting/standardising the data has been 
done by hand - which is clearly not ideal. In the next phase of the project the 
KB+ project is going to work with the GoKB project http://gokb.org - as part of 
this collaboration we are currently working on ways of streamlining the data 
processing from publisher files or other sources, to standardised data. While 
we are still working on how this is going to be implemented, we are currently 
investigating the possibility of using Google/Open Refine to capture and re-run 
sets of rules across data sets from specific sources. We should be making 
progress on this in the next couple of months.

Hope that's helpful

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 16 Oct 2012, at 20:23, Tom Pasley tom.pas...@gmail.com wrote:


You might also be interested in the work at http://www.kbplus.ac.uk .
The site is up at the moment, but I can't reach it for some reason...
they have a public export page which you might want to know about
http://www.kbplus.ac.uk/kbplus/publicExport

Tom

On Wed, Oct 17, 2012 at 8:12 AM, Jonathan Rochkind rochk...@jhu.edu wrote:


I think KBART is such an effort.  As with most library standards
groups, there may not be online documentation of their most recent
efforts or successes, but: http://www.uksg.org/kbart

http://www.uksg.org/kbart/s5/**guidelines/data_formathttp://www.uksg
.org/kbart/s5/guidelines/data_format



On 10/16/2012 2:16 PM, Godmar Back wrote:


Hi,

at our library, there's an emerging need to process title lists from
vendors for various purposes, such as checking that the titles
purchased can be discovered via discovery system and/or OPAC. It
appears that the formats in which those lists are provided are
non-uniform, as is the process of obtaining them.

For example, one vendor - let's call them Expedition Scrolls -
provides title lists for download to Excel, but which upon closer
inspection turn out to be HTML tables. They are encoded using an odd
mixture of CP1250 and HTML entities. Other vendors use entirely

Re: [CODE4LIB] Q.: software for vendor title list processing

2012-10-16 Thread Jonathan Rochkind

I think KBART is such an effort.  As with most library standards groups, 
there may not be online documentation of their most recent efforts or 
successes, but: http://www.uksg.org/kbart


http://www.uksg.org/kbart/s5/guidelines/data_format


On 10/16/2012 2:16 PM, Godmar Back wrote:

Hi,

at our library, there's an emerging need to process title lists from
vendors for various purposes, such as checking that the titles purchased
can be discovered via discovery system and/or OPAC. It appears that the
formats in which those lists are provided are non-uniform, as is the
process of obtaining them.

For example, one vendor - let's call them Expedition Scrolls - provides
title lists for download to Excel, but which upon closer inspection turn
out to be HTML tables. They are encoded using an odd mixture of CP1250 and
HTML entities. Other vendors use entirely different formats.

My question is whether there are efforts, software, or anything related to
streamlining the acquisition and processing of vendor title lists in
software systems that aid in the collection development and maintenance
process. Any pointers would be appreciated.

  - Godmar

Re: [CODE4LIB] formatting citation output programmatically

2012-10-11 Thread Jonathan Rochkind

There are a billion different citation formats with their own rules. I 
don't think there is any simple answer to the question you ask.


On 10/11/2012 2:45 PM, William Gunn wrote:

Hi list!

I have a technical question about formatting citation output which
some of you may have dealt with in the past. I see journal names and
their abbreviations listed three different ways:

ALL CAPS no periods:
http://images.webofknowledge.com/WOK46/help/WOS/A_abrvjt.html
Proper Case, with periods: http://www.lib.berkeley.edu/BIOS/j_abbr.html
Proper Case, no periods: http://home.ncifcrf.gov/research/bja/journams_a.html

As far as I'm aware, citations in published papers should always be
proper case, but are there any cases where a journal should be cited
without periods in the abbreviated form? I'm aware of the edge cases
like PLOS, JAMA, BMJ, but what I'm wondering is if anyone knows of any
instances where a journal which is normally abbreviated as Anal.
Biochem. would instead be formatted as Anal Biochem (without periods)
in the references list/bibliography for a paper?

If anyone has dealt with this issue in the past, I'd love to hear what
you came up with. Thanks!


William Gunn
+1 646 755 9862
http://synthesis.williamgunn.org/about/

Support free access to scientific journal articles arising from
taxpayer-funded research: http://wh.gov/6TH

Re: [CODE4LIB] Seeking examples of outstanding discovery layers

2012-09-20 Thread Jonathan Rochkind


On 9/20/2012 1:39 PM, Karen Coyle wrote:


So, given this, and given that in a decent-sized catalog users regularly
retrieve hundreds or thousands of items, what is the best way to help
them grok that set given that the number of records is too large for
the user to look at them one-by-one to make a decision? Can the fact
that the data is in a database help users get a feel for what they
have retrieved without having to look at every record?


I've often felt that, if it can be properly presented, facets are a 
really great way to do this.  Facets (with hit counts next to every 
value) give you a 'profile' of a result set that is too large for you to 
get a sense of otherwise, they give you a sort of descriptive 
statistical summary of it.


When the facets are 'actionable', as they are usually, they also let you 
then drill down to particular aspects of the giant result set you are 
interested in, and get a _different_ 2.5 screens of results you'll look at.


Of course, library studies also often show that our users don't use the 
facets, heh. But there are a few conflicting studies that shows they are 
used a significant minority of the time. I think it may have to do with 
UI issues of how the facets are presented.


It's also important to remember that it doesn't neccesarily represent a 
failure if the user's don't engage with the results beyond the first 2.5 
screens -- it may mean they got what they wanted/needed in those first 
2.5 screens.


And likewise, that it's okay for us libraries to develop features which 
are used only by significant minorities of our users (important to 
remember what our logs show is really significant minorities of _uses_. 
 All users using a feature 1% of the time can show up the same as 1% of 
users using a feature 100% of the time).  We are not lowest common 
denominator, while we need to make our interfaces _usable_ by everyone 
(lowest common denominator perhaps), it's part of our mission to provide 
functionality in those interfaces for especially sophisticated uses that 
won't be used by everyone all the time.

Re: [CODE4LIB] Displaying TGN terms

2012-09-17 Thread Jonathan Rochkind


From the examples you've given how about:

1. Start with the first (most detailed) element in the hieararchy.
2. Moving up the hieararchy, add on the first inhabited place found, 
if any.
3. Continuing to move up the hieararchy, add on the first nation 
found, if any.




On 9/17/2012 3:12 PM, ddwigg...@historicnewengland.org wrote:

We use the Getty Thesaurus of Geographic Names for coding place names in our 
museum and archival cataloguing systems. We're currently struggling with the 
best way to display and make these terms searchable in our online database.

Currently we're just displaying the term itself, which is flawed, because just seeing 
Springfield or Florence doesn't give the user enough information to figure 
out where something was really made.

But we're finding that the number of variant place types in TGN makes it hard 
to figure out a concise way of indiciating a more detailed place name that will 
work consistently across all entries in the thesaurus.

For example, the full hierarchy for Florence (the one in Italy) is

Florence (inhabited place), Firenze (province), Tuscany (region), Italy 
(nation), Europe (continent), World (facet)

Neigborhoods and other local subdivisions can be even more of a mouthfull:

Notting Hill (neighborhood), Kensington and Chelsea (borough), London 
(inhabited place), Greater London (metropolitan area), England (country), 
United Kindom (nation), Europe (continent), World (facet)

Ideally I'd probably like to show the above as  Florence, Italy and Notting Hill, 
London, England

But I'm having trouble coming up with an algorithm that can consistently spit 
these out in the form we'd want to display given the data available in TGN.

Would welcome any ideas or feedback on this.

Thanks,

David


__

David Dwiggins
Systems Librarian/Archivist, Historic New England
141 Cambridge Street, Boston, MA 02114
(617) 994-5948
ddwigg...@historicnewengland.org
http://www.historicnewengland.org

Re: [CODE4LIB] Random Casual Poll: What abt. Web Services Should You Know?

2012-09-10 Thread Jonathan Rochkind


Okay, here's my own reverse survey for you. :)

What is web services, what job description or role or responsibilities 
does a librarian planning to work in web services mean to you? 
Because I'm not sure myself, nor am I sure everyone else who uses that 
term agrees.


My answer to your survey depends on what we're talking about.

And I have NO idea what w/e means or stands for.

On 9/10/2012 4:12 PM, Michael Schofield wrote:

Hi everyone,



Every so often in the library blogosophere I see posts dedicated to whether
librarians should know how to code. The answer I usually give is awful -
something like, Um. Probably. Anyway, since you all work with the web
and/or library systems, I'm curious about your wizened answers. Here's the
scenario: if a LIS student intending to work in web services (or w/e) asked
your advice, what code / platforms / other skills would you recommend for
success?



I'll compile and share the results in a couple of weeks.



All the best,



Michael Schofield(@nova.edu) | Web Services Librarian

Alvin Sherman Library, Research, and Information Technology Center



Hi! Hit me up any time, but I'd really appreciate it if you report broken
links, bugs, your meeting minutes, or request an awesome web app over on the
Library Web Services http://staff.library.nova.edu/pm  site.

Re: [CODE4LIB] U of Baltimore, Final Usability Report, link resolvers -- MIA?

2012-09-06 Thread Jonathan Rochkind

 not report on Encore in the final
 analysis.  The study (and chapter) does offers findings on the other three
 discovery tools.

 There were six student groups in the course; each group studied two tools
 with the same user population (undergrad, graduate and faculty) so that
 each tool was compared against the other three with each user population
 overall.  The .pdf that you found was the final report of one of those six
 groups, so it only addresses two of the four tools.  The chapter is the
 only document that pulls the six portions of the study together.

 I would be happy to discuss this with any of you individually if you need
 more information.

 Thanks for your interest in the study.


 Lucy Holman, DCD
 Director, Langsdale Library
 University of Baltimore
 1420 Maryland Avenue
 Baltimore, MD  21201
 410-837-4333

 -  end insert 

 Jonathan LeBreton
 Sr. Associate University Librarian
 Temple University Libraries
 Paley M138,  1210 Polett Walk, Philadelphia PA 19122
 voice: 215-204-8231
 fax: 215-204-5201
 mobile: 215-284-5070
 email:  lebre...@temple.edu
 email:  jonat...@temple.edu


  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  karim boughida
  Sent: Tuesday, September 04, 2012 5:09 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] U of Baltimore, Final Usability Report, link
 resolvers --
  MIA?
 
  Hi Tom,
  Top players are EDS, Primo and Summonthe only reason I see encore in
 the
  mix is if you have other III products which is not the case of Ubalt
 library. They
  have now worldcat? Encore vs Summon is an easy win for summon.
 
  Let's wait for Jonathan LeBreton (Thanks BTW).
 
  Karim Boughida
 
  On Tue, Sep 4, 2012 at 4:26 PM, Tom Pasley tom.pas...@gmail.com wrote:
   Yes, I'm curious to know too! Due to database/resource matching or
   coverage perhaps (anyone's guess).
  
   Tom
  
   On Wed, Sep 5, 2012 at 7:50 AM, karim boughida kbough...@gmail.com
  wrote:
  
   Hi All,
   Initially EDS, Primo, Summon, and Encore were considered but only
   Encore and Summon were tested. Do we know why?
  
   Thanks
   Karim Boughida
  
  
   On Tue, Sep 4, 2012 at 10:44 AM, Jonathan Rochkind rochk...@jhu.edu
   wrote:
Hi helpful code4lib community, at one point there was a report
 online at:
   
   
   http://student-iat.ubalt.edu/students/kerber_n/idia642/Final_Usabilit
   y_Report.pdf
   
David Walker tells me the report at that location included findings
about SFX and/or other link resolvers.
   
I'm really interested in reading it. But it's gone from that
location,
   and
I'm not sure if it's somewhere else (I don't have a title/author to
   search
for other than that URL, which is not in google cache or internet
   archive).
   
Is anyone reading this familiar with the report? Perhaps one of the
   authors
is reading this, or someone reading it knows one of the authors and
can
   be
put me in touch?  Or knows someone likely in the relevant dept at
ubalt
   and
can be put me in touch? Or has any other information about this
report or ways to get it?
   
Thanks!
   
Jonathan
  
  
  
   --
   Karim B Boughida
   kbough...@gmail.com
   kbough...@library.gwu.edu
  
 
 
 
  --
  Karim B Boughida
  kbough...@gmail.com
  kbough...@library.gwu.edu

Re: [CODE4LIB] U of Baltimore, Final Usability Report, link resolvers -- MIA?

2012-09-05 Thread Jonathan Rochkind


On 9/5/2012 9:04 AM, Emily Lynema wrote:

Yes, there were (we used 360 Link during the testing). This is one of the
reasons we turned on 1-Click about 6 months ago and have been fairly
pleased with the results.


What does turn on 1-Click mean with regard to Summon?

This has turned into a somewhat interesting conversation. We all need to 
talk about this stuff more!

Re: [CODE4LIB] U of Baltimore, Final Usability Report, link resolvers -- MIA?

2012-09-05 Thread Jonathan Rochkind


Ah, thanks.




If you are thinking of using Summon with a different link resolver,
you'd have to see if they provide a similar pass-through type service. I
*think* that SFX does.



SFX indeed does, but I think on the same basis as 360Link -- turn it on 
or off globally.


Umlaut, the open source link resolver front-end that I develop, provides 
this feature with even more control -- a URL query param can be provided 
to turn it on for certain requests but off globally. So for instance, if 
your Summon interface could send that special URL param with it's 
openURLs, you could have it on for links from Summon but off for default 
links.  Or if you were creating your own Summon interface with the 
Summon API, you could even have one link (say, off of title) that did 1 
click, but another link (say, below the record) that gave you the full 
menu.


Alternately, if you have no control of the linking like this, a feature 
could easily be added to Umlaut to turn on or off 1 click behavior 
based on source ID (rfr_id in the openurl, or even HTTP referrer).


I've been investigating Summon a bit with some trial access, and I have 
to say, just from a very basic surface invesgitation of this particular 
feature so far -- I am indeed quite impressed with Summon's index 
enhanced linking -- some things will just NEVER work well with OpenURL 
(say, digital video or audio links), and Summon's index enhanced 
linking also sometimes gets you to open access online copies that 
existing OpenURL link resolver products do a very poor job of discovering.

[CODE4LIB] U of Baltimore, Final Usability Report, link resolvers -- MIA?

2012-09-04 Thread Jonathan Rochkind


Hi helpful code4lib community, at one point there was a report online at:

http://student-iat.ubalt.edu/students/kerber_n/idia642/Final_Usability_Report.pdf

David Walker tells me the report at that location included findings 
about SFX and/or other link resolvers.


I'm really interested in reading it. But it's gone from that location, 
and I'm not sure if it's somewhere else (I don't have a title/author to 
search for other than that URL, which is not in google cache or internet 
archive).


Is anyone reading this familiar with the report? Perhaps one of the 
authors is reading this, or someone reading it knows one of the authors 
and can be put me in touch?  Or knows someone likely in the relevant 
dept at ubalt and can be put me in touch? Or has any other information 
about this report or ways to get it?


Thanks!

Jonathan

Re: [CODE4LIB] U of Baltimore, Final Usability Report, link resolvers -- MIA?

2012-09-04 Thread Jonathan Rochkind

Ha, I'm terrible at google searching apparently, thanks Matt and Joe, I 
believe that is what I was looking for. code4lib++


On 9/4/2012 10:48 AM, Matthew LeVan wrote:

It's like a google search challenge!  Looks like they changed their student
home link patterns...

http://home.ubalt.edu/nicole.kerber/idia642/Final_Usability_Report.pdf

Thanks,

matt



On Tue, Sep 4, 2012 at 10:44 AM, Jonathan Rochkind rochk...@jhu.edu wrote:


Hi helpful code4lib community, at one point there was a report online at:

http://student-iat.ubalt.edu/**students/kerber_n/idia642/**
Final_Usability_Report.pdfhttp://student-iat.ubalt.edu/students/kerber_n/idia642/Final_Usability_Report.pdf

David Walker tells me the report at that location included findings about
SFX and/or other link resolvers.

I'm really interested in reading it. But it's gone from that location, and
I'm not sure if it's somewhere else (I don't have a title/author to search
for other than that URL, which is not in google cache or internet archive).

Is anyone reading this familiar with the report? Perhaps one of the
authors is reading this, or someone reading it knows one of the authors and
can be put me in touch?  Or knows someone likely in the relevant dept at
ubalt and can be put me in touch? Or has any other information about this
report or ways to get it?

Thanks!

Jonathan

Re: [CODE4LIB] haititrust

2012-08-03 Thread Jonathan Rochkind

There is a HathiTrust search API that you can use, in addition to 
RSS/OpenSearch.  I can look up the details when i'm back at work next week if 
you can't find em googling.  In fact, I think there are two seperate HT apis, 
one that searches HT fulltext and one that just searches metadata. 

I use the metadata searching one in production, and indeed use it to look up HT 
records by ISBN, LCCN, and OCLCnum. 

I am not sure if you can limit to just items your library owns using this API 
though.  At a minimum (this may be obvious) your library would probably need to 
be a HT member, and have shared holdings information with HT -- otherwise HT 
has no idea which items your library owns. (My library is a HT member but has 
not yet shared holdings information with HT, because, well, we aren't able to 
identify our holdings reliably with OCLCnumbers, which is how HT (reasonably) 
wants it0. 

The support/question link at the top right of all HT pages, contrary to usual 
expectations (heh), actually does usually get directed to the right person and 
get a response, even for technical questions. I'd give a shot asking them 
directly. 

Jonathan

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Ford, Kevin 
[k...@loc.gov]
Sent: Friday, August 03, 2012 12:20 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] haititrust

Ideally, you shouldn't need the hathifiles.

The HathiTrust search page links to an OpenSearch document [1], which 
promisingly identifies an RSS feed and a JSON serialization of the search 
results.  Neither appears to work. In theory, doing as Jon says and then 
appending view=rss would get you an RSS feed.  There is a contact email in 
the OpenSearch document you might try.

FWIW, if you look at the search page HTML, there is a fixme note in an HTML 
comment, the same comment, incidentally, that also comments out the RSS feed 
link in the HTML.

Yours,

Kevin

[1] http://catalog.hathitrust.org/Search/OpenSearch?method=describe





 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Jon Stroop
 Sent: Friday, August 03, 2012 11:15 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] haititrust

 You can do an empty query in their catalog, and use the Original
 Location facet to filter to a holding library. Programatically, I'm
 not sure, but you'd probably need to use the Hathi files:
 http://www.hathitrust.org/hathifiles.

 -Jon

 On 08/03/2012 11:07 AM, Eric Lease Morgan wrote:
  If I needed/wanted to know what materials held by my library were
 also
  in the HaitTrust, then programmatically how could I figure this out?
  In other words, do you know of a way to query the HaitTrust and limit
  the results to items my library owns? --Eric Lease Morgan

Re: [CODE4LIB] haititrust

2012-08-03 Thread Jonathan Rochkind

Not an answer to your question, but if you want to share I'm curious what your 
use case is where you want to limit to items your library owns. 

If HathiTrust has em in fulltext -- why would it matter to your patrons if your 
library has a print copy or not? And if HT does not have them in fulltext 
still, why would it matter to your patrons if your library has a print copy or 
not?

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Eric Lease 
Morgan [emor...@nd.edu]
Sent: Friday, August 03, 2012 11:07 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] haititrust

If I needed/wanted to know what materials held by my library were also in the 
HaitTrust, then programmatically how could I figure this out? In other words, 
do you know of a way to query the HaitTrust and limit the results to items my 
library owns? --Eric Lease Morgan

Re: [CODE4LIB] code4lib.org down?

2012-06-25 Thread Jonathan Rochkind

Yeah, the whole server seems to be down, including planet.code4lib.org 
hosted there, etc.


Anyone know what individual we should bring this to their attention?

On 6/25/2012 8:30 AM, Ed Summers wrote:

Paging Oregon State: do we know why code4lib.org isn't responding?

 http://code4lib.org/

HTTP requests currently seem to timeout.

//Ed

PS. Thanks to Carol Bean for noticing it, and bringing it up in #code4lib :-)

Re: [CODE4LIB] Academic libraries - Will dev for pay models?

2012-06-06 Thread Jonathan Rochkind

It seems odd to me for the library to charge individual departments for 
special projects. Although I realize it can make sense and be reasonable 
in some cases, I think there are some dangers.


I mean, the library is already funded to provide services to the rest of 
the university, right?  EVERYTHING we do serves other schools and 
departments, that's what we do, almost all our customers are internal. 
Different universities have different ways of accounting for this -- the 
individual schools or departments may already have budget line items 
moving cash from their budget to the libraries, or the university may 
just take care of it.


But either way, it's usually flat rate, pay for the libraries budget. 
The Business School doesn't get better service than the Philosophy Dept 
because they've got a bigger budget; nor are schools/departments usually 
'charged back' because their undergrads use the reference librarians 
more than other depts/schools.


Likewise, some features we develop serve some department/schools more 
than others. If we realized there was a need to search/facet by MeSH 
(NLM Medical Subject) headings, and we weren't doing that yet, but we 
had the capability to do it -- would we only add that feature if the 
Medical School paid us?


I realize that all of our universities are increasingly trying to 
subject their components to market discipline, making everything be a 
fee-based transaction. I think our professional ethics should be to 
resist this -- it's true we can't do everything we might want and need 
to prioritize -- but I think our professional ethics in a university 
library should be against giving better service to those parts of the 
university which can pay more.


But, really, I just put this out as something to think about. I realize 
that in some cases it can make sense, and be reasonable and ethical. But 
I think care is warranted.


Another thing to beware of with software development in particular -- is 
that software going to be running on your servers, are you expected to 
maintain it as well?  We who develop software realize that software is 
hardly ever one and done, software (like libraries, Ranganthan's last 
law) is a growing organism, it takes constant care and feeding. Even 
if no features are ever added (and certainly people WILL ask for 
changes), it takes constant operational care just to keep the thing 
running, including patching dependencies for security vulnerabilities, 
as well as simple operational/hardware expenses, etc. If you charge per 
project the end, but are responsible for maintaining the software 
indefinitely, that doesn't work even from a strictly budgetary perspective.


With digital collections, for instance, if possible I think it'd make a 
lot more sense to support as part of the libraries mission and general 
budget, say, an general Omeka installation that anyone can use to create 
their own 'exhibition', and/or a general Repository that anyone can use 
to store their digital artifacts, rather than charge individual projects 
per-project to develop (and then charge more per-year to 
maintain/support?).  Even just on basic financial sustainability grounds.



On 6/6/2012 4:24 PM, Eric Larson wrote:

Hi Rosy,

Thanks for your reply. I would greatly appreciate seeing your spreadsheets.

We do an honorable amount of project estimation and time-tracking here,
too. We always draft a Memorandum of Understanding -- an agreement for
what work the library will provide on the project and a timetable for
completing said work -- with our digital collection project clients. We
try hard to stay focused on the deliverables in that document, but
there's always some feature creep in development work.

We do not have plans to charge back for development services, but
wondered if other schools worked in such a way. The recent success of
our new library catalog launch and future digital collection platform
(Hi Blacklight folk) has momentarily increased interest in our
born-digital digital collection efforts. There's also a campus-wide
effort here at UW-Madison to raise awareness for Educational
Innovation opportunities that might generate new revenue streams for
the university. We're not used to charging for our services in the
library, but some hypothetical partnerships could present the
opportunity. I'm sure other public institutions are doing similar
what-if revenue exercises:
http://edinnovation.wisc.edu/

Thanks again and I'll ping you off list to chat more.

Cheers,
- Eric


On Jun 6, 2012, at 11:28 AM, Rosalyn Metz wrote:


Hey Eric,

At GW we've been doing some cost estimates for projects. Essentially we
pull together the team, figure out the different tasks that need to be
accomplished, determine who will be working on those tasks, estimate
hours
necessary to do the work, and then use salaries to calculate the cost.

Right now we're primarily doing this for digitization projects, but I've
had experience doing this at other jobs (not in

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Jonathan Rochkind

I have become recently unpleasantly aquainted with the world of Marc 
that is not Marc21, but is ISO 2709.


What'll it do on ISO 2709? I might be able to dig up an example. I 
wonder if it'll claim it's Marc21 (not), or if it's Marc21 
Non-confirming (no, it's not quite that either. It's ISO-2709 MARC 
that's not Marc21).


If it just doens't know anything about it and says 'data', that's just 
fine, if it knows about Marc21 but not non-Marc21 ISO 2709.


On 5/23/2012 3:48 PM, Ford, Kevin wrote:

Does it work for bulk files?

-- It passed on a file containing 215 MARC Bibs and on a file containing 2,574 MARC Auth records.  
Don't know if you consider these bulk, but there is more than 1 record in each file 
(caveat: file stops after evaluating the first line, so of the 2,574 Auth records, the 
last 2,573 could be invalid).  It failed on a file containing all of LC Classification.  I need to 
figure out why.


Kevin, do you have examples of the output?

-- I received MARC21 Bibliography and MARC21 Authority respectively.  In theory, if Leader 
20-23 are not 4500 then (non-conforming) should be appended to the identification.  If 
requested, the mimetype - application/marc - should also be outputted.

Rgds,

Kevin





-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Ross Singer
Sent: Wednesday, May 23, 2012 3:29 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARC Magic for file

Wow, this is pretty cool.

Kevin, do you have examples of the output?

Does it work for bulk files?

I mean, I could just try this on my Ubuntu machine, but it's all the
way downstairs...

-Ross.

On May 23, 2012, at 3:14 PM, Ford, Kevin wrote:


I finally had occasion today (read: remembered) to see if the *nix

file command would recognize a MARC record file.  I haven't tested
extensively, but it did identify the file as MARC21 Bibliographic
record.  It also correctly identified a MARC21 Authority Record.  I'm
running the most recent version of Ubuntu (12.04 - precise pangolin).


I write because the inclusion of a file MARC21 specification rule

in the magic.db stems from a Code4lib exchange that started in March
2011 [1] (it ends in April if you want to go crawling for the entire
thread).


Rgds,

Kevin

[1]
https://listserv.nd.edu/cgi-

bin/wa?A2=ind1103L=CODE4LIBT=0F=S=P=1

12728

--
Kevin Ford
Network Development and MARC Standards Office Library of Congress
Washington, DC

[CODE4LIB] ruby-marc 0.5.0 released

2012-05-07 Thread Jonathan Rochkind


v0.5.0
- Extensive rewrite of MARC::Reader (ISO 2709 binary reader) to
  provide a fairly complete and consistent handing of char encoding
  issues in ruby 1.9.
  - This code is well covered by automated tests, but ends up complex,
there may be bugs, please report them.
  - May not work properly under jruby with non-unicode source
encodings.
  - Still can't handle Marc8 encoding.
  - May not have entirely backwards compatible behavior with regard to
char encodings under ruby 1.9.x as previous 0.4.x versions. Test
your code. In particular, previous versions may have automatically
_transcoded_ non-unicode encodings to UTF-8 for you. This version
will not do so unless you ask it to with correct arguments.


`gem install ruby-marc -v 0.5.0 `

https://github.com/ruby-marc/ruby-marc

Re: [CODE4LIB] crowdsourced book scanning

2012-04-25 Thread Jonathan Rochkind

ILL at most institutions does not keep scanned copies for future 
patrons, not even in a database that's not publically searchable.


To do so would be of highly questionable legality with regard to 
copyright. As would be this plan, alas.


You can easily violate copyright just sharing within the (eg) university 
community, or even just among librarians, it does not need to be 
'publicly searchable' to violate copyright.


On 4/25/2012 2:20 PM, Ross Singer wrote:

I am not sure this would be as much of a problem as long as it's not a publicly 
searchable database (that is, people can't browse scans are there and choose 
them).  Of course, this restriction makes it difficult to envision how the UI 
would work, but something triggered by an exact match should work.

Then again, I am not a lawyer.

-Ross.

On Apr 25, 2012, at 2:05 PM, Andrew Shuping wrote:


What type of pages from books are you talking about?  Like reference
materials, histories, biographies, fiction?  Because while my first
thought is that would be an interesting idea, my immediate second
thought is that publishers and authors would never allow it to happen
because of Copyright.  Even in ILL land we can't keep scanned pages
for a long period of time due to copyright restrictions.

Also this sounds a lot like the Google Books project...

Andrew Shuping
Interlibrary Loan/Emerging Technologies  Services Librarian
Jack Tarver Library
Robert Frost - In three words I can sum up everything I've learned
about life: it goes on.


On Wed, Apr 25, 2012 at 1:36 PM, Michael Lindsey
mlind...@law.berkeley.edu  wrote:

A colleague posed an interesting idea: patrons scan book pages to deliver to
themselves by email, flash drive, etc.
What if the scans didn't disappear from memory, but went into a repository
so the next patron looking for that passage didn't have to jockey the
flatbed scanner?

  * Patron scans library barcode at the scanner
  * The system says, I have these pages available in cache.
 o Patron's project overlaps with the cache and saves time in the
   scanning, or
 o Patron needs different pages, scans them and contributes to the
   cache

Now imagine a consortium of some sort where when the patron scans the
barcode, the system takes a hop via the ISBN number in the record to reach
out to a cache developed between a number of libraries.
I know there are a number of cases where this may not apply, like loose-leaf
publications in binders that get updated, etc.  And I'm sure there are
discussions around how to handle copyright, fair use, etc.
Do we as a community already have a similar endeavor in place?

Michael Lindsey
UC Berkeley Law Library

Re: [CODE4LIB] more on MARC char encoding

2012-04-19 Thread Jonathan Rochkind


Ah, thanks Terry.

That canned cleaner in MarcEdit sounds potentially useful -- I'm in a 
continuing battle to keep the character encoding in our local marc 
corpus clean.


(The real blame here is on cataloger interfaces that let catalogers save 
data that are illegal bytes for the character set it's being saved as. 
And/or display the data back to the cataloger using a translation that 
lets them show up as expected even though they are _wrong_ for the 
character set being saved as.  Connexion is theoretically the rolls 
royce of cataloger interfaces, does it do this? Gosh I hope not.)


On 4/19/2012 2:20 PM, Reese, Terry wrote:

Actually -- the issue isn't one of MARC8 versus UTF8 (since this data is being 
harvested from DSpace and is UTF8 encoded).  It's actually an issue with user 
entered data -- specifically, smart quotes and the like.  These values 
obviously are not in the MARC8 characterset and cause many who transform user 
entered data (which tend to be used by default on Windows) from XML to MARC.  
If you are sticking with a strickly UTF8 based system, there generally are not 
issues because these are valid characters.  If you move them into a system 
where the data needs to be represented in MARC -- then you have more problems.

We do a lot of harvesting, and because of that, we run into these types of issues moving data that 
is in UTF8, but has characters not represented in MARC8, from into Connexion and having some of 
that data flattened.  Given the wide range of data not in the MARC8 set that can show up in UTF8, 
it's not a surprise that this would happen.  My guess is that you could add a template to your XSLT 
translation that attempted to filter the most common forms of these smart quotes/values 
and replace them with the more standard values.  Likewise, if there was a great enough need, I 
could provide a canned cleaner in MarcEdit that could fix many of the most common varieties of 
these smart quotes/values.

--TR

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of 
Jonathan Rochkind
Sent: Thursday, April 19, 2012 11:13 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] more on MARC char encoding

If your records are really in MARC8 not UTF8, your best bet is to use a tool to 
convert them to UTF8 before hitting your XSLT.

The open source 'yaz' command line tools can do it for Marc21.

The Marc4J package can do it in java, and probably work for any MARC variant 
not just Marc21.

Char encoding issues are tricky. You might want to first figure out if your 
records are really in Marc8, thus the problems, or if instead they illegally 
contain bad data or data in some other encoding (Latin1).

Char encoding is a tricky topic, you might want to do some reading on it in 
general. The Unicode docs are pretty decent.

On 4/19/2012 11:06 AM, Deng, Sai wrote:

Hi list,
I am a Metadata librarian but not a programmer, sorry if my question seems 
naïve. We use XSLT stylesheet to transform some harvested DC records from 
DSpace to MARC in MarcEdit, and then export them to OCLC.
Some characters do not display correctly and need manual editing, for example:
In MarcEditor   
Transferred to OCLC   Edit in OCLC
Bayes’ theorem  
Bayes⁰́₉ theorem  Bayes' theorem
―it won‘t happen here‖ attitude ⁰́₅it won⁰́₈t happen here⁰́₆ attitude   
it won't happen here attitude
“Generation Y”  ⁰́₋Generation Y⁰́₊
  Generation Y
listeners‟ evaluationslisteners⁰́ evaluations  
listeners' evaluations
high school – from high school ⁰́₃ from 
  high school – from
Co₀․₅Zn₀․₅Fe₂O₄  Co²́⁰⁰́Þ²́⁵Zn²́⁰⁰́Þ²́⁵Fe²́²O²́⁴
   Co0.5Zn0.5Fe2O4?
μ  Îơ   

   μ
Nafion®Nafion℗ʼ 
 Nafion®
Lévy  L©♭vy 
   Lévy
43±13.20 years 43℗ł13.20 years  
43±13.20 years
12.6 ± 7.05 ft∙lbs  12.6 ℗ł 7.05 ft⁸́₉lbs   
   12.6 ± 7.05 ft•lbs
‘Pouring on the Pounds'  ⁰́₈Pouring on the Pounds

Re: [CODE4LIB] more on MARC char encoding

2012-04-19 Thread Jonathan Rochkind


On 4/19/2012 3:23 PM, LeVan,Ralph wrote:

We see Unicode data pasted into MARC8 records all the time.  It happens enough 
that my MARC8-Unicode converter takes a second look at illegal MARC8 bytes and 
tries a UTF-8 encoding as well.


Right. I see it too. I'm arguing that means cataloger entry tools, the 
tools which catalogers are using when they paste that stuff in, are not 
giving the cataloger sufficient feedback as to their entry. Flagging 
completely illegal byte sequences in the output encoding and not letting 
them be saved; make sure cataloger input is displayed back _as 
appropriate for the current encoding_, so they get immediate visual 
feedback if they're entering bytes that don't mean what they think for 
the operative output encoding.


I think it's possible _no_ cataloger interfaces actually do this. 
(although if any do, I bet it's MarcEdit).


If Connexion doesn't, for interactive cataloger entry, it'd be awfully 
nice if it did.

[CODE4LIB] ruby-marc, better ruby 1.9 char encoding support, testers wanted

2012-04-19 Thread Jonathan Rochkind

I have implemented fairly complete and robust proper support for 
character encodings in ruby-marc when reading 'binary' marc under ruby 1.9.


It's currently in a git branch, not yet released, and not yet in git 
master. https://github.com/ruby-marc/ruby-marc/tree/char_encodings


If anyone who uses this (or doesn't) has a chance to beta test it, it 
would be appreciated. One way to test, checkout with git, switch to 
'char_encodings' branch, and `rake install` to install as a gem to your 
system.  These changes should _only_ effect use under ruby 1.9, and only 
effect reading in 'binary' (ISO 2709) marc.


The new functionality is pretty extensively covered by automated tests, 
but there are some weird and complex interactions that can occur 
depending on exactly what you're doing, bugs are possible. It was 
somewhat more complicated than one might expect to implement a complete 
solution here, in part because we _do_ have international users who use 
ruby-marc, with encodings that are neither MARC8 nor UTF8, and in fact 
non-MARC21.


If any of the other committers (or anyone else) wants to code review, 
you are welcome to.


POSSIBLE BACKWARDS INCOMPAT

Some previous 0.4.x versions, when running under ruby 1.9 only, would 
automatically _transcode_ non-unicode encodings to UTF-8 for you under 
the hood. The new version no longer does so automatically (although you 
can ask it to). It was not tenable to support that backwards compatibly.


Everything else _ought_ to be backwards compatible with previous 0.4.x 
ruby-marc under ruby 1.9, fixing many problems.


NEW FEATURES

All applying to ruby 1.9 only, and to reading binary MARC only.

* Do a pretty good job of setting encodings properly for your ruby 
environment, especially under standard UTF-8 usage.


* You _can_ and _do have to_ provide an argument for reading non-UTF8 
encodings. (but sadly no support for marc8).


* You can ask MARC::Reader to transcode to a different encoding when 
loading marc.


* You can ask MARC::Reader to replace bytes that are illegal in the 
believed source encoding with a replacement character (or the empty 
string) to avoid ruby invalid UTF-8 byte exceptions later, and 
sanitize your input.


New features documented in inline comments, see at:
http://rubydoc.info/github/ruby-marc/ruby-marc/MARC/Reader

I had trouble making the docs concise, sorry, I think I've been pounding 
my head against this stuff so much realizing how complicated it ends up 
being that I wasn't sure what to leave out.

Re: [CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21

2012-04-18 Thread Jonathan Rochkind


On 4/18/2012 6:04 AM, Tod Olson wrote:

It has to mean UTF-8. ISO 2709 is very byte-oriented, from the directory 
structure to the byte-offsets in the fixed fields. The values in these places 
all assume 8-bit character data, it's completely baked in to the file format.


I'm not sure that follows. One could certainly have UTF-16 in a Marc 
record, and still count bytes to get a directory structure and byte 
offsets. (In some ways it'd be easier since every char would be two bytes).


In fact, I worry that the standard may pre-date UTF-8, with it's 
reference to UCS ---  if I understand things right, at one point there 
was only one unicode encoding, called UCS, which is basically a 
backwards-compatible subset of what became UTF-16.


So I worry the standard really means UCS/UTF-16.

But if in fact records in the wild with the 'u' value are far more 
likely to be UTF-8... well it's certainly not the first time the MARC21 
standard was useless/ignored as a standard in answering such questions.

[CODE4LIB] MarcXML and char encodings

I know how char encodings work in MARC ISO binary -- the encoding can 
legally be either Marc8 or UTF8 (nothing else).  The encoding of a 
record is specified in it's header. In the wild, specified encodings are 
frequently wrong, or data includes weird mixed encodings. Okay!


But what's going on with MarcXML?  What are the legal encodings for 
MarcXML?  Only Marc8 and UTF8, or anything that can be expressed in 
XML?  The MARC header is (or can) be present in MarcXML -- trust the 
MARC header, or trust the XML doctype char encoding?


What's the legal thing  to do? What's actually found 'in the wild' with 
MarcXML?


Can anyone advise?

Jonathan

Re: [CODE4LIB] MarcXML and char encodings

So what if the ?xml? decleration says one charset encoding, but the 
MARC header included in the MarcXML says a different encoding... which 
one is the 'legal' one to believe?


Is it legal to have MarcXML that is not UTF-8 _or_ Marc8, that is an 
entirely different charset that is legal in XML?  If you did that, what 
should the MARC header included in the XML say?


I know how char encodings work in XML.  I don't understand what the 
standards say about how that interacts with the MARC data in MarcXML.


Jonathan

On 4/17/2012 1:51 PM, LeVan,Ralph wrote:

There are probably a couple of answers to that.

XML rules define what characterset is used. The encoding attribute on
the?xml?  header is where you find out what characterset is being
used.

I've always gone under the assumption that if an encoding wasn't
specified, then UTF-8 is in effect and that has always worked for me.
It turns out the standard says US-ASCII is the default encoding.

But, ignoring the encoding, the original MarcXML rules were the same as
the MARC-21 rules for character repertoire and you were suppose to
restrict yourself to characters that could be mapped back into MARC-8.
I don't know if that rule is still in force, but everyone ignores it.

I hope that helps!

Ralph

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Jonathan Rochkind
Sent: Tuesday, April 17, 2012 12:35 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: MarcXML and char encodings

I know how char encodings work in MARC ISO binary -- the encoding can
legally be either Marc8 or UTF8 (nothing else).  The encoding of a
record is specified in it's header. In the wild, specified encodings are

frequently wrong, or data includes weird mixed encodings. Okay!

But what's going on with MarcXML?  What are the legal encodings for
MarcXML?  Only Marc8 and UTF8, or anything that can be expressed in
XML?  The MARC header is (or can) be present in MarcXML -- trust the
MARC header, or trust the XML doctype char encoding?

What's the legal thing  to do? What's actually found 'in the wild' with
MarcXML?

Can anyone advise?

Jonathan

Re: [CODE4LIB] MarcXML and char encodings


On 4/17/2012 1:57 PM, Kyle Banerjee wrote:

In some cases, invalid XML. In an ideal world, the encoding should be 
included in the declaration. But I wouldn't trust it. kyle 


So would you use the Marc header payload instead?

Or you're just saying you wouldn't trust _any_ encoding declerations you 
find anywhere?


When writing a library to handle marc, I think the base line should be 
making it do the official legal standards-complaint right thing.  Extra 
heuristics to deal with invalid data can be added on top.


But my trouble here is I can't even figure out what the official legal 
standards-compliant thing is.


Maybe that's becuase the MarcXML standard simply doesn't address it, and 
it's all implementation dependent. sigh.


The problem is how the XML documents own char encoding is supposed to 
interact with the MARC header; especially because there's no way to put 
Marc8 in an XML char encoding doctype (is there?);  and whether 
encodings other than Marc8 or UTF8 are legal in MarcXML, even though 
they aren't in MARC ISO binary.


I think the answer might be nobody knows, and there is no standard 
right way to do it. Which is unfortunate.

Re: [CODE4LIB] MarcXML and char encodings


Okay, maybe here's another way to approach the question.

If I want to have a MarcXML document encoded in Marc8 -- what should it 
look like?  What should be in the XML decleration? What should be in the 
MARC header embedded in the XML?  Or is it not in fact legal at all?


If I want to have a MarcXML document encoded in UTF8, what should it 
look like? What should be in the XML decleration? What should be in the 
MARC header embedded in the XML?


If I want to have a MarcXML document with a char encoding that is 
_neither_ Marc8 nor UTF8, but something else generally legal for XML -- 
is this legal at all? And if so, what should it look like? What should 
be in the XML decleration? What should be in the MARC header embedded in 
the XML?


On 4/17/2012 1:57 PM, Kyle Banerjee wrote:

What's the legal thing  to do? What's actually found 'in the wild' with
MarcXML?


In some cases, invalid XML.

In an ideal world, the encoding should be included in the declaration. But
I wouldn't trust it.

kyle

Re: [CODE4LIB] MarcXML and char encodings


Thanks, this is helpful feedback at least.

I think it's completely irrelevant, when determining what is legal under 
standards, to talk about what certain Java tools happen to do though, I 
don't care too much what some tool you happen to use does.


In this case, I'm _writing_ the tools. I want to make them do 'the right 
thing', with some mix of what's actually official legally correct and 
what's practically useful.  What your Java tools do is more or less 
irrelevant to me. I certainly _could_ make my tool respect the Marc 
leader encoded in MarcXML over the XML decleration if I wanted to. I 
could even make it assume the data is Marc8 in XML, even though there's 
no XML charset type for it, if the leader says it's Marc8.


But do others agree that there is in fact no legal way to have Marc8 in 
MarcXML?


Do others agree that you can use non-UTF8 encodings in MarcXML, so long 
as they are legal XML?


I won't even ask someone to cite standards documents, because it's 
pretty clear that LC forgot to consider this when establishing MarcXML.  
(And I have no faith that one could get LC to make a call on this and 
publish it any time this century).


Has anyone seen any Marc8-encoded MarcXML in the wild? Is it common? How 
is it represented with regard to the XML leader and the Marc header?


Has anyone seen any MarcXML with char encodings that are neither Marc8 
nor UTF8 in the wild? Are they common? How are they represented with 
regard to XML leader and Marc header?


On 4/17/2012 2:32 PM, LeVan,Ralph wrote:

If I want to have a MarcXML document encoded in Marc8 -- what should

it

look like?  What should be in the XML decleration? What should be in

the

MARC header embedded in the XML?  Or is it not in fact legal at all?

I'm going out on a limb here, but I don't think it is legal.  There is
no formal encoding that corresponds to MARC-8, so there's no way to tell
XML tools how to interpret the bytes.



If I want to have a MarcXML document encoded in UTF8, what should it
look like? What should be in the XML decleration? What should be in

the

MARC header embedded in the XML?

?xml encoding=UTF-8?

I suppose you'll want to set the leader to UTF-8 as well, but it doesn't
really matter to any XML tools.



If I want to have a MarcXML document with a char encoding that is
_neither_ Marc8 nor UTF8, but something else generally legal for XML

Re: [CODE4LIB] MarcXML and char encodings


On 4/17/2012 3:01 PM, Sheila M. Morrissey wrote:

No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8 in 
the XML prolog,


Wait, how canyou declare a Marc8 encoding in an XML 
decleration/prolog/whatever it's called?


The things that appear there need to be from a specific list, and I 
didn't think Marc8 was on that list?


Can you give me an example?  And, if you happen to have it, link to XML 
standard that says this is legal?

[CODE4LIB] more on MARC char encoding: Now we're about ISO_2709 and MARC21


Okay, forget XML for a moment, let's just look at marc 'binary'.

First, for Anglophone-centric MARC21.

The LC docs don't actually say quite what I thought about leader byte 
09, used to advertise encoding:



a - UCS/Unicode
Character coding in the record makes use of characters from the 
Universal Coded Character Set (UCS) (ISO 10646), or Unicode™, an 
industry subset.




That doesn't say UTF-8. It says UCS or Unicode. What does that 
actually mean?  Does it mean UTF-8, or does it mean UTF-16 (closer to 
what used to be called UCS I think?).  Whatever it actually means, do 
people violate it in the wild?




Now we get to non-Anglophone centric marc. I think all of which is 
ISO_2709?  A standard which of course is not open access, so I can't get 
it to see what it says.


But leader 09 being used for encoding -- is that Marc21 specific, or is 
it true of any ISO-2709?  Marc8 and unicode being the only valid 
encodings can't be true of any ISO-2709, right?


Is there a generic ISO-2709 way to deal with this, or not so much?

Re: [CODE4LIB] Silently print (no GUI) in Windows

2012-04-03 Thread Jonathan Rochkind


If you had PDFs, you could probably do it.

But if you have a bunch of different proprietary application files 
each one is different, and needs software that can interpret the file 
and turn it into a print job (postscript, or whatever).  Normally this 
software is the 'full application' that owns it, say Microsoft Word.  
The particular application may come with software to 'silently' print, 
but most probably does not.  The particular format may have a competitor 
that can open it (say, OpenOffice for Microsoft Word), and an open 
source competitor is perhaps more likely to have such 'silent printing' 
ability -- but it would still need to be done on a format-by-format basis.


I don't know if anyone's selling software that can try to do what you're 
talking about for a multitude of popular formats. But it's pretty much 
impossible for there to be software that can do it for every/any format.


I think you're not going to have much luck.

Perhaps you could figure out a way to use some kind of Windows 'macro' 
program to actually open up each document in the 'full application' and 
choose File/Print, but to do this unattended.  I am not familiar with 
such software.


On 4/3/2012 2:48 PM, Kozlowski,Brendon wrote:

Not a dumb question at all. In this particular case, the receiving PC that is 
to be storing/printing the documents will be taking jobs from multiple 
networks, buildings, etc by either piping an email account, or downloading via 
a user's upload from a webpage. We already have a solution for catching jobs in 
the print spooler (not ours), but need to automate the sending of the documents 
to the spooler itself.

The only way I've ever sent documents to the spooler was by opening up the full 
application (ex: Microsoft Word), and using the GUI to send the print job. 
Since the PC housing and releasing these files is expected to be un-manned and 
sit in a back room, we just need to be able to silently print the jobs in the 
background. Opening multiple applications over and over again would use up a 
lot of resources, so a silent, no-GUI option would be the best from my very 
little understanding - if it's even possible.



Brendon Kozlowski
Web Administrator
Saratoga Springs Public Library
49 Henry Street
Saratoga Springs, NY, 12866
[518] 584-7860 x217

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Kyle Banerjee 
[baner...@uoregon.edu]
Sent: Tuesday, April 03, 2012 1:25 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Silently print (no GUI) in Windows

At the risk of asking a dumb question, why wouldn't a print server meet
your use case if the print jobs come from elsewhere?

kyle

On Tue, Apr 3, 2012 at 9:15 AM, Kozlowski,Brendonbkozlow...@sals.eduwrote:


I'm curious to know if anyone has discovered ways of silently printing
documents from such Windows applications as:



- Acrobat Reader (current version)

- Microsoft Office 2007 (Word, Excel, Powerpoint, Visio, etc...)

- Windows Picture and Fax Viewer



I unfortunately haven't had much luck finding any resources on this.



I'd like to be able to receive documents in a queue like fashion to a
single PC and simply print them off as they arrive. However, automating the
loading/exiting of the full-blown application each time, and on-demand,
seems a little too cumbersome and unnecessary.



I have not yet decided on whether I'd be scripting it (PHP, AutoIT, batch
files, VBS, Powershell, etc...) or learning and then writing a .NET
application. If .NET solutions use the COM object, the scripting becomes a
potential candidate. Unfortunately I need to know how, or even if, it's
even possible to do first.



Thank you for any and all feedback or assistance.




Brendon Kozlowski
Web Administrator
Saratoga Springs Public Library
49 Henry Street
Saratoga Springs, NY, 12866
[518] 584-7860 x217

Please consider the environment before printing this message.

To report this message as spam, offensive, or if you feel you have
received this in error,
please send e-mail to ab...@sals.edu including the entire contents and
subject of the message.
It will be reviewed by staff and acted upon appropriately.




--
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 503.999.9787

To report this message as spam, offensive, or if you feel you have received 
this in error,
please send e-mail to ab...@sals.edu including the entire contents and subject 
of the message.
It will be reviewed by staff and acted upon appropriately.

Re: [CODE4LIB] Anyone implementing common LIS applications on PaaS providers?

2012-03-29 Thread Jonathan Rochkind

Older 3.x versions of Blacklight may have put a solrmarc.jar inside your 
app's ./config/SolrMarc.  That may not be caught by your slug ignore.


This was an error, it was never meant to do that. If you have one in a 
BL 3.x you should be safe to remove it.


Other than that, I'm curious what's making a BL app so large!

Incidentally, you don't need a ./jetty in your local app _at all_, 
unless you actually want to keep a jetty Solr there. BL will optionally 
install one there, but it's not required.


(Does slug size include your gem dependencies? I am not familiar with 
heroku. Cause the BL gem itself _does_ also include a SolrMarc.jar, if 
that's a problem, we'd have to refactor things on the BL side to make it 
an optional dependency instead of baked into BL).


On 3/29/2012 12:37 PM, Chris Fitzpatrick wrote:

Hey Sean,

Jah, I did that...my .slugignore is:
tmp/*
log/*
coverage/*
spec/*
koha/*
jetty/*

That dropped it down to 30 from ~50mb, so that's good .
(koha has some scripts wrote to pull from our ILS).

I think the slug size is a really minor issue. Heroku says under 25mb
is good, but over 50mb is not so good.  Not Good,  but not Chaotic
Evil . Neutral Good.



On Thu, Mar 29, 2012 at 6:26 PM, Sean Hannanshan...@jhu.edu  wrote:

If you already have everything indexed in Solr elsewhere, a way to cut down
the BL slug size is to remove/ignore the SolrMarc.jar. It's pretty sizable.

-Sean


On 3/29/12 12:16 PM, Chris Fitzpatrickchrisfitz...@gmail.com  wrote:


Hi,

I've deployed Blacklight on both Heroku and Elastic BeanStalk.

Heroku is still a much better choice. The only issue I had was I
needed to make sure the sass-rails gem in installed in the :production
gem group and not just development.

  I still have an issue of getting heroku to compile all my
sass/coffeescript/etc assets on update, but it actually doesn't seem
to make much of an impact on performance. The minor issue is that it
would be nice to figure out a way to slim down BL's slug size. The
lowest I've been able to get it is about 30mb and Heroku recommends
having it be below 25mb.

I have not used Heroku's solr service (I still use EC2 for my solr
deployments).
EngineYard would also be another option.

There is also an AMI for DSpace, so deploying that to EC2 should be
pretty easy

b,chris.



On Thu, Mar 29, 2012 at 3:55 PM, Rosalyn Metzrosalynm...@gmail.com  wrote:

Erik,

I haven't tried it (recently) on PaaS providers, but I have on IaaS.  The
AMIs I've created in association with start up scripts (if you're
interested in seeing those let me know, I'd have to look for them somewhere
or other) mean that the application automagically starts up on its own, all
you need to do is go to the URL.  I've used this as a back up method in the
past and I think would be a great way for people to be able to play with
the different apps before committing.

To this end, I created an AMI for Blacklight a while back:
http://www.rosalynmetz.com/ami-3c10f255/  I guarantee you it is grossly out
of date.  I also have instructions on creating an EBS backed AMI:
http://rosalynmetz.com/ideas/2011/04/14/creating-an-ebs-backed-ami/ which
is the method I used for creating the Blacklight AMI. These instructions
are also fairly old, but I still get comments on my blog now and then that
the method works.

I also played around with it on Heroku, but that was so long ago I don't
think any of the things I learned still apply (this was when Heroku was
fairly new to the scene).  Hope some of this helps.

Rosalyn



On Thu, Mar 29, 2012 at 8:34 AM, Seth van Hoolandsvhoo...@ulb.ac.bewrote:


Dear Erik,

Bram Wiercx and myself have given a talk on how to put together a package
to install CollectiveAccess on Red Hat's OpenShift:
http://www.dish2011.nl/sessions/open-source-software-platform-collectiveacce
s-as-a-service-solution
.

My students are currently happily playing around with CollectiveAccess,
which they have installed on OpenShift. My teaching assistant Max De Wilde
has developed clear guidelines on how to run the installation procedure:
http://homepages.ulb.ac.be/~svhoolan/redhat_ca_install.pdf.

It would be wonderful to aggregate these kind of installation procedure's
for other types of LIS applications...

Kind regards and looking forward to your book!

Seth van Hooland
Président du Master en Sciences et Technologies de l'Information et de la
Communication (MaSTIC)
Université Libre de Bruxelles
Av. F.D. Roosevelt, 50 CP 123  | 1050 Bruxelles
http://homepages.ulb.ac.be/~svhoolan/
http://twitter.com/#!/sethvanhooland
http://mastic.ulb.ac.be
0032 2 650 4765
Office: DC11.113

Le 29 mars 2012 à 14:10, Erik Mitchell a écrit :


Hi all,

I have been toying with the process of implementing common LIS
applications (e.g. Vufind, Dspace, Blacklight. .  .) on PaaS providers
like Heroku and Amazon Elastic Beanstalk.  I have just tried out of
the box distributions so far and have not made much progress but was
wondering if someone else had tried this or had ideas

Re: [CODE4LIB] Anyone implementing common LIS applications on PaaS providers?

2012-03-29 Thread Jonathan Rochkind


On 3/29/2012 5:05 PM, Chris Fitzpatrick wrote:

locally and push them rather than rely on Heroku to precompile them
(currently when I push, Heroku's precompile fails, so it reverts to
compile at runtime mode) if anyone has insight into this, please
lemme know...I believe having them compile at runtime does slow down
the application...


Have no idea why it's not working in heroku, no experience with heroku 
(although I'm familiar with the concept).


But compile at runtime _will_ slow down your app, yeah. Here's a 
stackoverflow I asked on it myself:


http://stackoverflow.com/questions/8821864/config-assets-compile-true-in-rails-production-why-not

Compiling locally and then pushing should work, and is arguably better 
in some ways (why waste cycles on the production machine compiling 
assets?)  But, if you choose to compile and check into your source 
control repo,  here's a trick that will keep it from driving you crazy 
in development using your on-disk compiled assets... eh, I can't find 
the blog post on google now, but it's something like changing 
config.assets.path = /dev-assets in environments/development.rb, so in 
development it will ignore your on disk compiled assets.

Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-08 Thread Jonathan Rochkind

a) Mis-characterized MARC char encodings are common amongst many of our 
corpuses and ILS's. It is a common problem. It can be very inconvenient. 
Not only Marc8 that says it's UTF8 and vice versa, but something that 
says it's MARC8 or UTF8 but is actually neither.


b) While one solution would be having the marc tool pass the char stream 
through as is without complaining like Godmar suggested; and another 
solution would be trying to heuristically guess the 'real' solution like 
Gabe suggests;  personally I favor a different solution:


The thing that's encoding as unicode on the way out?  Instead of raising 
on an invalid char, it should have the option of silently eating it, 
replacing it with either empty string or the unicode replacement 
character ( used to replace an incoming character whose value is 
unknown or unrepresentable in Unicode 
[http://www.fileformat.info/info/unicode/char/fffd/index.htm] )


I have worked with character encoding libraries before that have this 
option, replace messed up bytes with unicode replacement char. I don't 
know what's avail in Python though.


Jonathan

On 3/8/2012 3:19 PM, Gabriel Farrell wrote:

Sounds like what you do, Terry, and what we need in PyMARC, is
something like UnicodeDammit [0]. Actually handling all of these
esoteric encodings would be quite the chore, though.

I also used to think it would be cool if we could get MARC8
encoding/decoding into the Python standard library, but then I
realized I'd rather work on other stuff while MARC8 withers and dies.


[0] https://github.com/bdoms/beautifulsoup/blob/master/BeautifulSoup.py#L1753

On Thu, Mar 8, 2012 at 2:36 PM, Reese, Terry
terry.re...@oregonstate.edu  wrote:

This is one of the reasons you really can't trust the information found in 
position 9.  This is one of the reasons why when I wrote MarcEdit, I utilize a 
mixed process when working with data and determining characterset -- a process 
that reads this byte and takes the information under advisement, but in the end 
treats it more as a suggestion and one part of a larger heuristic analysis of 
the record data to determine whether the information is in UTF8 or not.  
Fortunately, determining if a set of data is in UTF8 or something else, is a 
fairly easy process.  Determining the something else is much more difficult, 
but generally not necessary.

For that reason, if I was advising other people working on MARC processing 
libraries, I'd advocate having a process for recognizing that certain 
informational data may not be set correctly, and essentially utilize a 
compatibility process to read and correct them.  Because unfortunately, while 
the number of vendors and systems that set this encoding byte correctly has 
increased dramatically (it used to be pretty much no one) -- but it's still so 
uneven, I generally consider this information unreliable.

--TR

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Godmar 
Back
Sent: Thursday, March 08, 2012 11:01 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded 
III records

On Thu, Mar 8, 2012 at 1:46 PM, Terray, Jamesjames.ter...@yale.edu  wrote:


Hi Godmar,

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 9:
ordinal not in range(128)

Having seen my fair share of these kinds of encoding errors in Python,
I can speculate (without seeing the pymarc source code, so please
don't hold me to this) that it's the Python code that's not set up to
handle the UTF-8 strings from your data source. In fact, the error
indicates it's using the default 'ascii' codec rather than 'utf-8'. If
it said 'utf-8' codec can't decode..., then I'd suspect a problem with the 
data.

If you were to send the full traceback (all the gobbledy-gook that
Python spews when it encounters an error) and the version of pymarc
you're using to the program's author(s), they may be able to help you out 
further.



My question is less about the Python error, which I understand, than about the 
MARC record causing the error and about how others deal with this issue (if 
it's a common issue, which I do not know.)

But, here's the long story from pymarc's perspective.

The record has leader[9] == 'a', but really, truly contains ANSEL-encoded data. 
 When reading the record with a MARCReader(to_unicode = False) instance, the 
record reads ok since no decoding is attempted, but attempts at writing the 
record fail with the above error since pymarc attempts to
utf8 encode the ANSEL-encoded string which contains non-ascii chars such as
0xe8 (the ANSEL Umlaut prefix). It does so because leader[9] == 'a' (see [1]).

When reading the record with a MARCReader(to_unicode=True) instance, it'll 
throw an exception during marc_decode when trying to utf8-decode the 
ANSEL-encoded string. Rightly so.

I don't blame pymarc for this behavior; to me, the record looks wrong.

  - Godmar

(ps: that said, what pymarc does fails

Re: [CODE4LIB] Q.: MARC8 vs. MARC/Unicode and pymarc and misencoded III records

2012-03-08 Thread Jonathan Rochkind


Oh, and why do I favor this solution?

Compared to passing input through as is:  You're just prolonging the 
pain, something downstream is still going to have a problem with it, 
outputting known illegal data is not a good idea.


Compared to heuristically guessing encoding: Heuristically guessing is 
okay, but obviously a good deal harder than just replacing bad data with 
unicode 'replacement' glyph.  But honestly, I don't _want_ this kind of 
mis-encoded data to be completely transparent -- I want it to do 
something to make the error visible (without stopping the app or data 
transformation process in it's tracks), so catalogers can't possibly 
think that the data is just fine.  If you use heuristics to guess, 
sometimes those heuristics will fail -- when they do, the catalogers 
will think there's something wrong with your logic. But it works fine 
for all the other records that you say have the same problem, why can't 
it work fine for this one?  But this is partially as a result of my 
general conclusions, from experience, about trying to heuristically 
'autocorrect' bad marc data -- I try to do it as minimally as possible. 
It's too easy to get in a long battle with trying to make your 
heuristics better, instead of focusing on, you know, actually fixing the 
data.


Now, a place where i'd be willing to use heuristics -- a bulk process to 
try to actually fix the data in your ILS. Something that goes through 
all your marc and flags records that aren't legal for the encoding they 
claim to be. If you want to add heuristics there to try to guess what 
encoding they really are and automatically fix em, that doesn't seem a 
terrible idea to me.  But working around the problem with heuristics at 
higher levels does; spend time on actually fixing the bad data instead.  
Bad marc data, including illegal char encodings, is a continual 
inconvenience, you work around it in your pymarc-based software, 
eventually you'll have some other software in a different language that 
you have to duplicate your workarounds in.


On 3/8/2012 3:45 PM, Jonathan Rochkind wrote:
a) Mis-characterized MARC char encodings are common amongst many of 
our corpuses and ILS's. It is a common problem. It can be very 
inconvenient. Not only Marc8 that says it's UTF8 and vice versa, but 
something that says it's MARC8 or UTF8 but is actually neither.


b) While one solution would be having the marc tool pass the char 
stream through as is without complaining like Godmar suggested; and 
another solution would be trying to heuristically guess the 'real' 
solution like Gabe suggests;  personally I favor a different solution:


The thing that's encoding as unicode on the way out?  Instead of 
raising on an invalid char, it should have the option of silently 
eating it, replacing it with either empty string or the unicode 
replacement character ( used to replace an incoming character whose 
value is unknown or unrepresentable in Unicode 
[http://www.fileformat.info/info/unicode/char/fffd/index.htm] )


I have worked with character encoding libraries before that have this 
option, replace messed up bytes with unicode replacement char. I don't 
know what's avail in Python though.


Jonathan

On 3/8/2012 3:19 PM, Gabriel Farrell wrote:

Sounds like what you do, Terry, and what we need in PyMARC, is
something like UnicodeDammit [0]. Actually handling all of these
esoteric encodings would be quite the chore, though.

I also used to think it would be cool if we could get MARC8
encoding/decoding into the Python standard library, but then I
realized I'd rather work on other stuff while MARC8 withers and dies.


[0] 
https://github.com/bdoms/beautifulsoup/blob/master/BeautifulSoup.py#L1753


On Thu, Mar 8, 2012 at 2:36 PM, Reese, Terry
terry.re...@oregonstate.edu  wrote:
This is one of the reasons you really can't trust the information 
found in position 9.  This is one of the reasons why when I wrote 
MarcEdit, I utilize a mixed process when working with data and 
determining characterset -- a process that reads this byte and takes 
the information under advisement, but in the end treats it more as a 
suggestion and one part of a larger heuristic analysis of the record 
data to determine whether the information is in UTF8 or not.  
Fortunately, determining if a set of data is in UTF8 or something 
else, is a fairly easy process.  Determining the something else is 
much more difficult, but generally not necessary.


For that reason, if I was advising other people working on MARC 
processing libraries, I'd advocate having a process for recognizing 
that certain informational data may not be set correctly, and 
essentially utilize a compatibility process to read and correct 
them.  Because unfortunately, while the number of vendors and 
systems that set this encoding byte correctly has increased 
dramatically (it used to be pretty much no one) -- but it's still so 
uneven, I generally consider this information unreliable.


--TR

-Original

Re: [CODE4LIB] Microsoft Transact-SQL

2012-03-06 Thread Jonathan Rochkind

Then you might be best starting with a really good book on SQL in 
general, or 'standard' SQL.


On 3/6/2012 1:42 PM, Wilfred Drew wrote:

It is actually for a job I am interested in.  I have no SQL experience in depth 
at all. Just some using Access.

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jon 
Gorman
Sent: Tuesday, March 06, 2012 1:39 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Microsoft Transact-SQL

On Tue, Mar 6, 2012 at 11:05 AM, Wilfred Drewdr...@tc3.edu  wrote:

I did mean Transact-SQL!!  Sorry.  I am after book recommendations.


Right, sorry, should have made myself clearer.  Do you have previous experience 
with creating database queries?  I can't say I have any real recommendations, 
but it might help others.  (And you might be able to get away with a more 
general book on sql and then look through the online documentation for specific 
problems).

Jon Gorman

Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

2012-03-01 Thread Jonathan Rochkind

IF your HTML includes embedded semantic data using HTML5 microdata or 
RDFa or something similar (using a standard vocabulary -- the standard 
for repositories seems to be DC-based, since that's often all you can 
get out of OAI-PMH anyway) --- then web crawling combined with site maps 
probably provides about as much functionality as OAI-PMH.


But embedded semantic metadata is key.  However, even in the current 
OAI-PMH-considered-standard-best-practice world, the document-level 
metadata from repositories is often _extremely_ basic, as well as often 
unreliable.  This severely limits the functionality that harvesters can 
put harvests to.


So it's not neccesarily really about OAI-PMH vs web crawling. It's about 
sufficient and sufficiently reliable metadata.  And even in the OAI-PMH 
world, we rarely have it.


Note for instance that OAISter and similar harvesters are _unable to 
know_ whether a harvested document is open access full text or not.  
That seems like something you'd want to tell people in their search 
results right, they might only want stuff that they can actually 
access.  But  it's not really possible, becuase most (all?) repo's do 
not reveal any standard metadata in their OAI-PMH that would specify this.


On 3/1/2012 9:38 AM, Ian Ibbotson wrote:

Owen...

Just wanted to say that, whilst I've been silent since my initial response,
I'm not sure I agree with all the viewpoints presented here.. From a point
of view of (for example, CultureGrid) I'm not sure what has been done could
have been pragmatically achieved soley with web crawling as it's described
in this thread. Don't have a problem with anything thats been written here.
It certainly represent a great cross-section of viewpoints. However, from a
jisc discovery perspective, I don't want to contribute to any confirmation
bias that we could dispose of pesky old OAI. I'd be interested in providing
a counter-point to any Best practice document that suggested we could.

Ian.

On Thu, Mar 1, 2012 at 12:36 PM, Owen Stephenso...@ostephens.com  wrote:


Thanks Jason and Ed,

I suspect within this project we'll keep using OAI-PMH because we've got
tight deadlines and the other project strands (which do stuff with the
harvested content) need time from the developer. At the moment it looks
like we will probably combine OAI-PMH with web crawling (using nutch) - so
use data from the

However, that said, one of the things we are meant to be doing is offering
recommendations or good practice guidelines back to the (repository)
community based on our experience. If we have time I would love to tackle
the questions (a)-(d) that you highlight here - perhaps especially (a) and
(c). Since this particular project is part of the wider JISC 'Discovery'
programme (http://discovery.ac.uk and tech principles at
http://technicalfoundations.ukoln.info/guidance/technical-principles-discovery-ecosystem)
- from which one of the main themes might be summarised as 'work with the
web' these questions are definitely relevant.

I need to look at Jason's stuff again as I think this definitely has
parallels with some of the Discovery work, as, of course, does some of the
recent discussion on here about the question of the indexing of library
catalogues by search engines.

Thanks again to all who have contributed to the discussion - very useful

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

On 1 Mar 2012, at 11:42, Ed Summers wrote:


On Mon, Feb 27, 2012 at 12:15 PM, Jason Ronallojrona...@gmail.com

wrote:

I'd like to bring this back to your suggestion to just forget OAI-PMH
and crawl the web. I think that's probably the long-term way forward.

I definitely had the same thoughts while reading this thread. Owen,
are you forced to stay within the context of OAI-PMH because you are
working with existing institutional repositories? I don't know if it's
appropriate, or if it has been done before, but as part of your work
it would be interesting to determine:

a) how many IRs allow crawling (robots.txt or lack thereof)
b) how many IRs support crawling with a sitemap
c) how many IR HTML splashpages use the rel-license [1] pattern
d) how many IRs support syndication (RSS/Atom) to publish changes

If you could do this in a semi-automated way for the UK it would be
great if you could then apply it to IRs around the world. It would
also align really nicely with the sort of work that Jason has been
doing around CAPS [2].

It seems to me that there might be an opportunity to educate digital
repository managers about better aligning their content w/ the Web ...
instead of trying to cook up new standards. I imagine this is way out
of scope for what you are currently doing--if so, maybe this can be
your next grant :-)

//Ed

[1] http://microformats.org/wiki/rel-license
[2] https://github.com/jronallo/capsys

Re: [CODE4LIB] Local catalog records and Google, Bing, Yahoo!


On 2/23/2012 1:37 PM, Sean Hannan wrote:

Anecdotally, it would appear that bing (and bing-using yahoo) seem to
drastically play down catalog records in their results. We're not doing
anything to favor a particular search engine; we have a completely open
robots.txt file.


I think they're probably right to play down catalog records!

I wonder how many people searching on google and ending up at our 
catalog are actually satisfied with what they get there -- info on how 
to check the book out if they were affiliated with our university, or 
where to find it on the shelves if they come to baltimore? An electronic 
copy that most of the time they can't access without being affiliated 
with our university?

Re: [CODE4LIB] Local catalog records and Google, Bing, Yahoo!


On 2/23/2012 2:45 PM, Karen Coyle wrote:
This links to thoughts I've had about linked data and finding a way to 
use library holdings over the Web. Obviously, bibliographic data alone 
is a full service: people want to get the stuff once they've found out 
that such stuff exists. So how do we get users from the retrieval of a 
bibliographic record to a place where they have access to the stuff?


I see two options: the WorldCat model, where people get sent to a 
central database where they input their zip code, or a URL-like model 
where they get a link on retrievals that has knowledge about their 
preferred institution and access.


I think we need both of those, and mixtures between the two, and more.

OCLC is trying to do the second one too. For instance with their link 
resolver redirector. But it requires link resolvers being registered, 
link resolvers working, and link resolvers working for print materials, 
etc.


Of course get a link on retrievals begs the question of from where 
they are retrieving and who is generating this link?  But in theory, 
anyone with a retrieval system could give you a link through OCLC's link 
resolver redirector.  Which isn't quite fleshed out yet, but 
theoretically could then redirect you to the link resolver of your 
choice based on preferences or proximity. Except, well, it doens't work 
that well, for a variety of reasons both under and not under OCLC's 
control. But it's the sort of architecture we're talking about, I think.


(Now if there was a common machine-readable response for link resolver 
type requests, an OCLC-like service could even aggregate the responses 
from _several_ preferred institutions on one page. Umlaut originally 
tried to do that with SFX link resolvers, but it never really went 
anywhere).


Anyhow, yeah, both of those, and more. They definitely aren't mutually 
exclusive, and the sorts of technologies and metadata ecologies that are 
needed to support each one have a whole lot of overlap.


Incidentally, my Umlaut software, mostly targetted at academic 
libraries, is really focused on that exact problem: people want to get 
the stuff once they've found out that such stuff exists. So how do we 
get users from the retrieval of a bibliographic record to a place where 
they have access to the stuff?   But it's definitely not done yet, it's 
my goal with Umlaut, but there's still a lot left to do to get there. 
(Ultimately, you need some kind of LibX-type approach, browser plugin or 
javascript bookmarklet, to get people to a place where they have access 
from third parties that have absolutely no interest in collaborating on 
this plan. Amazon doesn't want to help you go anywhere other than Amazon 
to acquire a book). Definitely a work in progress, but the goal it's 
oriented to is exactly what you say. https://github.com/team-umlaut/umlaut


Jonathan






I have no idea if the latter is feasible on a true web scale, but it 
would be my ideal solution. We know that search engines keep track of 
your location and tailor retrievals based on that. Could libraries get 
into that loop?


kc

On 2/23/12 11:35 AM, Eoghan Ó Carragáin wrote:

That's true, but since Blacklight/Vufind often sit over
digital/institutional repositories as well as ILS systems  subscription
resources, at least some public domain content gets found that otherwise
wouldn't be. As you said, even if the item isn't available digitally, 
for

Special Collections libraries unique materials are exposed to potential
researchers who'd never have known about them.
Eoghan

On 23 February 2012 19:25, Sean Hannanshan...@jhu.edu  wrote:

It's hard to say. Going off of the numbers that I have, I'd say that 
they

do
find what they are looking for, but they unless they are a JHU 
affiliate,

they are unable to access it.

Our bounce rate for Google searches is 76%.  Which is not 
necessarily bad,
because we put a lot of information on our item record pages--we 
don't make

you dig for anything.

On the other hand, 9% of visits coming to us through Google searches 
are
return visits. To me, that says that the other 91% are not JHU 
affiliates,

and that's 91% of Google searchers that won't have access to materials.

I know from monitoring our feedback form, we have gotten in increase in
requests from far flung places for access to things we have in special
collections from non-affiliates.

So, we get lots of exposure via searches, but due to the nature of how
libraries work with subscriptions, licensing, membership and such, 
we close

lots of doors once they get there.

-Sean

On 2/23/12 1:55 PM, Schneider, Waynewschnei...@hclib.org  wrote:


This is really interesting. Do you have evidence (anecdotally or
otherwise) that the people coming to you via search engines found what
they were looking for? Sorry, I don't know exactly how to phrase this.
To put it another way - are your patrons finding you this way?

wayne

-Original Message-
From: Code for Libraries

[CODE4LIB] How to get from what you've found to access:


Changing the subject line, cause this is an interesting topic on it's own.

On 2/23/2012 2:45 PM, Karen Coyle wrote:
This links to thoughts I've had about linked data and finding a way to 
use library holdings over the Web. Obviously, bibliographic data alone 
is a full service: people want to get the stuff once they've found out 
that such stuff exists. So how do we get users from the retrieval of a 
bibliographic record to a place where they have access to the stuff?


I think this is exactly right as a problem libraries (which provide 
various methods of access to items people may find out about elsewhere) 
should be focusing on more than they do.


I mentioned in a previous reply that this is in fact exactly the mission 
of Umlaut. A better direct link for people interested in Umlaut than the 
one I pasted before:


https://github.com/team-umlaut/umlaut/wiki

It's definitely a work in progress, like I said, I'm not saying Umlaut 
solves this problem.  But the thinking behind Umlaut is that you've 
got to have software which can take a machine-described thing someone 
is interested in and tell them everything that your institution (which 
they presumably are affiliated with) can do for them for that item. 
That's exactly what Umlaut tries to do, providing a platform that you 
can use to piece together information from your various silo'd knowledge 
bases and third party resources, etc.  And including ALL your stuff, 
monographs etc., not just journal articles like typical link resolvers.


After that (and even that is hard), you've got to figure out how to 
_get_ people from where they've found out about something to your 
service for telling them what they can do with it via your institution. 
That's not an entirely solved problem. One reason Umlaut speaks 
OpenURL is there is already a substantial infrastructure of things in 
the academic market that can express a thing someone knows about in 
OpenURL and send it to your local software (including Google Scholar).  
But it's still not enough.


Ultimately some kind of LibX approach is probably required -- whether a 
browser plugin, or a javascript bookmarklet (same sort of thing, 
different technology), a way to get someone from a third party to your 
'list of services', even when that third party is completely 
uninterested in helping them get there (Amazon doesn't particularly want 
to help someone who starts at Amazon get somewhere _else_ to buy the 
book! Others may be less hostile, but just not all that interested in 
spending any energy on it).


Jonathan
I see two options: the WorldCat model, where people get sent to a 
central database where they input their zip code, or a URL-like model 
where they get a link on retrievals that has knowledge about their 
preferred institution and access.


I have no idea if the latter is feasible on a true web scale, but it 
would be my ideal solution. We know that search engines keep track of 
your location and tailor retrievals based on that. Could libraries get 
into that loop?


kc

On 2/23/12 11:35 AM, Eoghan Ó Carragáin wrote:

That's true, but since Blacklight/Vufind often sit over
digital/institutional repositories as well as ILS systems  subscription
resources, at least some public domain content gets found that otherwise
wouldn't be. As you said, even if the item isn't available digitally, 
for

Special Collections libraries unique materials are exposed to potential
researchers who'd never have known about them.
Eoghan

On 23 February 2012 19:25, Sean Hannanshan...@jhu.edu  wrote:

It's hard to say. Going off of the numbers that I have, I'd say that 
they

do
find what they are looking for, but they unless they are a JHU 
affiliate,

they are unable to access it.

Our bounce rate for Google searches is 76%.  Which is not 
necessarily bad,
because we put a lot of information on our item record pages--we 
don't make

you dig for anything.

On the other hand, 9% of visits coming to us through Google searches 
are
return visits. To me, that says that the other 91% are not JHU 
affiliates,

and that's 91% of Google searchers that won't have access to materials.

I know from monitoring our feedback form, we have gotten in increase in
requests from far flung places for access to things we have in special
collections from non-affiliates.

So, we get lots of exposure via searches, but due to the nature of how
libraries work with subscriptions, licensing, membership and such, 
we close

lots of doors once they get there.

-Sean

On 2/23/12 1:55 PM, Schneider, Waynewschnei...@hclib.org  wrote:


This is really interesting. Do you have evidence (anecdotally or
otherwise) that the people coming to you via search engines found what
they were looking for? Sorry, I don't know exactly how to phrase this.
To put it another way - are your patrons finding you this way?

wayne

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On 
Behalf Of

Sean Hannan
Sent: Thursday, February 23,

Re: [CODE4LIB] Local catalog records and Google, Bing, Yahoo!


On 2/23/2012 3:53 PM, Karen Coyle wrote:
Jonathan, while having these thoughts your Umlaut service did come to 
mind.  If you ever have time to expand on how it could work in a wide 
open web environment, I'd love to hear it. (I know you explain below, 
but I don't know enough about link resolvers to understand what it 
really means from a short 
explanation. Diagrams are always welcome!)


I'm not entirely sure what is meant by 'wide open web environment.'

I mean, part of the current environment is that there's lots of stuff on 
the web that is NOT free/open access, it's only available to certain 
licensed people. AND that libraries license a lot of this stuff on 
behalf of their user group. (Not just content, but sometimes services 
too).  It's really that environment Umlaut is focused on, if that 
changed, what would be required would have little to do with Umlaut as 
it is now, I think.


But I don't think anyone anticipates that changing anytime soon, I don't 
think that's what Karen means by 'wide open web environment.'


So if that continues to be the case I think Umlaut has a role 
working pretty much as it does now, it would work how it works.  (Maybe 
I'm not sufficiently forward-thinking).


I will admit that, while I come accross lots of barriers in implementing 
Umlaut, I have yet to come accross anything that makes me think this 
would be a lot easier if only there was more RDF.  Maybe it's a failure 
of imagination on my part.  More downloadable data, sure. More http 
APIs, even more so.  And Umlaut already takes advantage of such things, 
especially the API's more than the downloadable data (it turns out it's 
a lot more 'expensive' to try to download data and do something with it 
yourself, compared to using an API someone else provides to do the heavy 
lifting for you).  But has it ever been much of a problem that the data 
is in some format other than RDF, such that it would be easier in RDF? 
Not from my perspective, not really. (In some ways, RDF is harder to 
deal with than other formats, from where I'm coming from. If a service 
does offer data in RDF triples as well as something else, I'm likely to 
choose the something else).


This may be ironic because Umlaut is very concerned with 'linking data', 
in the sense of figuring out whether this record from the local catalog 
represents 'the same thing' as this record from Amazon, as this record 
from Google Books, or HathiTrust. If this citation that came in as an 
OpenURL represents the 'same thing' as a record in a vendor database, or 
mendeley, or whatever.


There are real barriers in making this determination; they wouldn't be 
solved if everything was just in RDF, but they _would_ be solved if 
there were more consistent use of identifiers, for sure. I DO think 
this would be easier if only there were more consistent use of 
identifiers all the time.


That experience with Umlaut is also what leads me to believe that the 
WEMI ontology is not only not contradictory to linked data 
applications, but _crucial_ for it. Realizing that without it, it's 
very hard to tell when something is the same thing. There are lots of 
times Umlaut ends up saying Okay, I found something that I _think_ is 
at least an edition of the same thing you care about, but I really can't 
tell you if it's the _same_ edition you are interested in or not.


So, yeah, Umlaut would work _better_ with more widespread use 
identifiers, and even better with consistent use of common identifiers. 
I guess that's maybe where RDF could come in, in expressing 
determinations people have made of this identifier in system X 
represents the same 'thing' as this other identifier in system Y 
(someone would still have to MAKE those determinations, RDF would just 
be one way to then convey that determination, and I wouldn't 
particularly care if it was conveyed in RDF or something else). So 
anyway, it would work better with some of that stuff, but would it work 
substantially _differently_? Not so much.


Ah, if web pages started having more embedded machine readable data with 
citations and identifiers of what is being looked at (microdata, RDFa, 
whatever), that would make it easier to get a user from some random web 
page _to_ an institution's Umlaut, that's one thing that would be nice.


You may (or may not) find the What is Umlaut, Anyway? article on the 
Umlaut wiki helpful.


https://github.com/team-umlaut/umlaut/wiki/What-is-Umlaut-anyway


And there's really not much to understand about 'link resolvers' for 
these purposes, except that there's this thing called OpenURL (really 
bad name), which is really just a way for one website to hyperlink to 
another website and pass a machine-readable citation to it. This 
application receiving the machine readable citation then tries to get 
the user to appropriate access or services for it, with regard to 
institutional entitlements. That's about it, if you understand that, you 
understand enough.  Except that most

Re: [CODE4LIB] Local catalog records and Google, Bing, Yahoo!