[CODE4LIB] Stats and public wireless devices

2012-12-18 Thread Walter Lewis
I know this is more of a hardware question than a code question but I suspect 
that a few of the folks that have other systems roles might be able to steer me 
in the right direction.

We're looking to replace the public wifi in the library, by itself nothing 

The key requirement after reliable connectivity, is the ability to produce some 
level of statistics relative to usage.  (I know: lies, damned lies and usage 
statistics).  We don't run a proxy or any other system that the public need a 
login to use.  I expect a fair number of connections that would just be staff 
walking in with a smart phone or other device.

After the laughter subsides, any thoughts as to a suitable device?


[CODE4LIB] Timelines (was: visualize website)

2012-08-31 Thread Walter Lewis
On 2012-08-30, at 1:03 PM, miles stauffer wrote:

 Is this what you are looking for?

The site points to TimelineJS at http://timeline.verite.co/ for timeline 
There is also the widget from the SIMILE project at MIT at 

Are there other suggestions for tools for time line visualizations?


[CODE4LIB] Expectations for count queries

2012-03-21 Thread Walter Lewis
In the various bundles of good ideas that represent result set
standards in the library and greater world, apart from the
atom/opensearch totalResults element, is there an expectation of how
one should package a number when that is *all* that is being

Use Case:
  dear dataset:
   if I asked you for steamboat records, how many would you send me?
   signed:  curious

  dear curious:
signed: dataset

I'm inclined to return just the number as Content-Type: text/plain.

Clearly the semantics of the query string require a mutual
understanding, but that's not my specific concern here.



2012-03-14 Thread Walter Lewis
On 2012-03-14, at 2:11 PM, Bess Sadler wrote:

 Q1. Is there an ILS that is not based on MaRC records?
 A1. No, not to my knowledge. Yes, marc cataloging can seem tedious and 
 arcane, but we have lots of tools for working with it at this point. All 
 commercial ILS vendors that I am aware of use it, and the open source ILS 
 products I know of also use MaRC.

Further note to this.  
a) All the commercial and non-commercial ILS systems used by more than one 
institution of which I am aware either added MARC processing or died.  
b) All of the systems for which I have seen the underpinnings have mapped the 
important values from the Marc record into various other SQL data structures.  
They may store the Marc on the side or  assemble it on the fly at the point of 
demand.  Marc enters and exits the system but may or may not drive the 

Walter Lewis
  who would happily forget everything he learned about Marc; but honestly folks 
there are lots of things that make less sense in the world

Re: [CODE4LIB] Metadata war stories...

2012-01-25 Thread Walter Lewis
On 2012-01-25, at 10:06 AM, Becky Yoose wrote:

 - Dirty data issues when switching discovery layers or using legacy/vendor
 metadata (ex. HathiTrust)

I have a sharp recollection of a slide in a presentation Roy Tennant offered up 
at Access  (at Halifax, maybe), where he offered up a range of dates extracted 
from an array of OAI harvested records.  The good, the bad, the 
incomprehensible, the useless-without-context (01/02/03 anyone?) and on and on. 
 In my years of migrating data, I've seen most of those variants.  (except ones 
*intended* to be BCE).  

Then there are the fielded data sets without authority control.  My favourite 
example comes from staff who nominally worked for me, so I'm not telling tales 
out of school.  The classic Dynix product had a Newspaper index module that we 
used before migrating it (PICK migrations; such a joy).  One title had twenty 
variations on Georgetown Independent (I wish I was kidding) and the dates 
ranged from the early ninth century until nearly the 3rd millenium. (apparently 
there hasn't been much change in local council over the centuries).

I've come to the point where I hand-walk the spatial metadata to links with to 
geonames.org for the linked open data. Never had to do it for a set with more 
than 40,000 entries though.  The good news is that it isn't hard to establish a 
valid additional entry when one is required.


Re: [CODE4LIB] My crazed idea about dealing with registration limitations

2011-12-22 Thread Walter Lewis
On 2011-12-22, at 1:55 PM, Peter Noerr wrote:

 Crazy variation number 3. Have two tracks which are identical, but time 
 shifted by half a day (or some other convenient unit). The presenters talk 
 twice on the same day - in the morning for track A and the afternoon for 
 track B. That way there is no speaker gulag, no time over-run (though, 
 following Declan's point, how much time is left out of the week after 
 travelling, so why not the whole week), and you get a chance to hear a really 
 interesting presentation twice - or miss it twice! 

One of the things I've always enjoyed about single track conferences like 
Code4Lib and Access is that when you are speaking you don't miss all the other 
great (and more often than not, greater) presentations happening in other rooms 
while you're talking about stuff you already know.  It might be different for 
some folks, but for some of us giving a presentation is *mostly* an excuse to 
get our employers to release us from other duties and fund travel and the 
opportunity to learn.  


Re: [CODE4LIB] Patents and open source projects

2011-12-06 Thread Walter Lewis
On 6 December 2011, at 9:46 AM, Roy Tennant wrote:

 I once got a cease and desist letter from a legal firm defending someone's 
 trademark for metadata. I mean, seriously. Perhaps obviously, I ignored it. 
 It's still in my files somewhere.

We had a variation in Ontario back in the 90s when a businessmen working with 
libraries heard the phrase virtual library pass my lips in conversation.  
Next thing I knew, he thought he had trademarked it.  

I try never to use the phrase these days, and he left the library market.

I can't begin to recall which of you I heard it from first.

Walter Lewis
Halton Hills

Re: [CODE4LIB] Programmer Orientation to Library/Lib Sci

2011-07-22 Thread Walter Lewis
On 22 July 2011, at 1:07 PM, Bigwood, David wrote:

 The extended ASCII character set, Latin-1, used in the old MARC systems was 
 always something that was neglected to get mentioned and not at all obvious. 
 Now that more systems are using UNICODE it should be less of a problem, all 
 depends on your system and if you still have legacy data.

Isn't Marc-8 different than Latin-1 in how it handles accents?

At least that's how I read
... and I'd never argue with Michael about this. :)

Walter Lewis
   who never met a character set he didn't wish he hadn't *had* to meet

Re: [CODE4LIB] Seth Godin on The future of the library

2011-05-17 Thread Walter Lewis
On 17 May 2011, at 11:18 AM, Jonathan Rochkind wrote:

 On 5/16/2011 7:52 PM, Luciano Ramalho wrote:
   And then we need to consider the rise of the Kindle. An ebook costs
   about $1.60 in 1962 dollars. A thousand ebooks can fit on one device,
 1) Why quote the ebook price in 1962 dollars? The reality in 2011 is
 that Kindle books in general are too expensive, particularly when
 Yeah, how much did a paperback book cost in 1962?  50 cents? $1?  I wasn't 
 alive then, but I bet $1.60 is expensive in 1962 dollars!

I usually use one of two inflation factors (the economists use a larger basket):
a) what did that house have cost me then?
b) what would I have earned on minimum wage then if I wasn't in a job that 
supplied room and board?

In US, minimum wage in 1962 was $1.15/hour; in 2009 it was $7.25  (x6.3).  
I wish paperbacks had only inflated at that rate

Local to where I am, the houses that in 1962 were offered for $12,000 go now in 
the $360,000 range  (x30)
That's actually not far off what I'm seeing for some of the thicker 
paperbacks this year.

Walter Lewis

Re: [CODE4LIB] Code4Lib 2011 Proposals

2010-03-03 Thread Walter Lewis
On 3 Mar 10, at 9:52 AM, Julia Bauder wrote:

 Also, the farther north we go, the more likely that snow+airplane
 incompatibilities will foil speakers' (and attendees'!) travel plans at the
 last minute, which isn't fun for anyone.

Actually there is a clear line (at least on the eastern half of the continent) 
where the further north you go, the *less* snow you got this.  Buffalo is 
trailing a number of places on the east coast in total snow accumulation and 
Toronto has been dusted a few times this winter, with nothing of real 
substance.  Detroit and Chicago were well below seasonal averages last time I 

ALL of that said,  where are the San Diego gang or the folks from Miami?

  who can only dream of pubs with open patios in February

Re: [CODE4LIB] Kingston? And now the date (was Re: [CODE4LIB] Location of the first Code4Lib North meeting?)

2010-01-29 Thread Walter Lewis
On 29 Jan 10, at 5:34 PM, Wendy Huot wrote:

 +1 Thursday-Friday 6-7 May
 The dates of 6th and 7th work for me and I think they work for Kingston.  
 Bill: librarian-hunting season begins in the late Fall, so we're in the clear.

+1 for me too.

I should note that while the standard librarian-hunting seasons overlap for 
public and academic librarians, there is a special sitting duck hunt that 
co-incides with the municipal budgeting process.  In some communities, like 
ours, it is actually televised  (think the worst bass-fishing show you've ever 
flipped past).


Re: [CODE4LIB] Location of the first Code4Lib North meeting?

2010-01-25 Thread Walter Lewis
On 25 Jan 10, at 11:23 AM, MJ Suhonos wrote:

 Might only be an issue crossing at the Detroit-Windsor border, though.  Not 
 sure how broadly his opinion may have spread beyond the state.

I think the key to the troubles at Windsor can be linked either to
a) Art Rhyno confessing at the border crossing he was going to be paid for 
going to a library conference (some XML thing), or
b) an American (name slips my mind) who ran into issues coming to Access when 
it was held in Windsor.

In short, it isn't a general US/Canadian border problem.  The evidence would 
suggest it is directly related the the University of Windsor's Leddy Library 
being too close to the bridge over the Detroit River. 


Re: [CODE4LIB] Location of the first Code4Lib North meeting?

2010-01-20 Thread Walter Lewis
On 20 Jan 10, at 10:16 AM, MJ Suhonos wrote:

 I think mode of transportation is something to consider; for those of us in 
 South/Eastern Ontario, most of the cities are relatively reachable within a 
 few hours by ground (excepting Sudbury, unfortunately).
 However, for those out-of-province coming via air transport, Kingston is at 
 least 2h from the closest major airport (Ottawa).  [NB: don't get me wrong; 
 as a Queen's graduate, I love Kingston very much].

As another Queen's grad (a little before MJ, I fear), I am also guilty of 
forgetting that Air Canada has about eight flights a day that drop into the 
airport at Kingston.  As a student it was mostly VIA, the bus or the thumb.  
That said, I would be driving right by Toronto's Pearson Airport on my way down 
and could time passage to do a pickup (or two).

  who loves Kingston in May and June

Re: [CODE4LIB] Location of the first Code4Lib North meeting?

2010-01-20 Thread Walter Lewis
On 20 Jan 10, at 2:30 PM, David Fiander wrote:

 Of course, as a corollary to the fact that all the locations being
 discussed are Canadian (well, except for Montreal), any Americans
 resident in the USA on the list do need to make sure that their
 passports will be valid through to the end of May, at least, in order
 to ensure you will be able to attend.

Is Canadian customs now requiring US Passports?  Used to be Hotel California:  
you could come over, but without your passport you couldn't go home.


Re: [CODE4LIB] Location of the first Code4Lib North meeting?

2010-01-20 Thread Walter Lewis
On 20 Jan 10, at 2:39 PM, Wendy Huot wrote:

 Regarding travel to Kingston:
 * For an interesting drive from upstate NY, you can get from Cape Vincent, NY 
 to Kingston by way of Wolfe Island + ferry.

Driving across the Thousand Islands Bridge is faster, but the interesting 
quotient goes way up via Wolfe Island  (two ferries: one cheap, one free)


Re: [CODE4LIB] Location of the first Code4Lib North meeting?

2010-01-20 Thread Walter Lewis
On 20 Jan 10, at 2:53 PM, David Fiander wrote:

 Walter plans on going to Kingston by way of Buffalo and Cape Vincent,
 just so he can take the ferries.

I've done just that, ... taking in a few lighthouses and harbours along the 
way!  (and  special collections at Cornell and Syracuse).


Re: [CODE4LIB] preconference proposals - solr

2009-11-13 Thread Walter Lewis
On 13 Nov 09, at 11:25 AM, Bess Sadler wrote:

 1. Morning session - solr white belt
 [delightful descriptions snipped]
 2. Morning session - solr black belt
 3. Afternoon session - Blacklight

Is there any chance that the black belt session needs to be/should be a two 
parter and run through the afternoon as well?  ... or repeat for those who have 
just acquired their white belts but are headed in different directions?

  who is happy to get all the direction on solr he can find

Re: [CODE4LIB] Long way to be a good coder in library

2009-07-23 Thread Walter Lewis

Ed Summers schrieb:

The first step is admitting that you are unable to understand *all*
the crazy library technology lingo, and that library-technology
environment as a whole has become unmanageable. :-)
If all else fails, as a noise filter, you could also do worse than to 
track the technologies that Ed Summers is interested in, or has 
contributed to ...

Walter Lewis
   part time edsu groupie

Re: [CODE4LIB] Open, public standards v. pay per view standards and usage

2009-07-14 Thread Walter Lewis

William Wueppelmann wrote:

I'm not entirely sure that TCP/IP and the other IETF RFCs became 
established because of restrictions placed on OSI. I was under the 
impression that OSI was also insanely complicated and that the IETF 
standards were much cheaper to implement from a technical standpoint. 
And, from a product standpoint, in the mid-90s, there were still a lot 
of bets being placed on closed online services like AOL, MSN, and 
Not to mention the book I once saw on MS Blackbird ... (MSN .0001?) 
which, thankfully, was abandonned before leaving the nest.

Any examples closer to the library world?
What I had been hoping for were data standards more in the library 
space.  I've read ANSI's Z.39.19 which deals with Monolingual thesauri.

 (a copy lives here:  http://www.slis.kent.edu/~mzeng/Z3919/8Z3919toc.htm)
Near as I can tell the parallel multi-lingual standard is ISO 5964 and 
is available at

for a fee of 168 Swiss francs (CHF)  or ~$155USD

I pay attention to the one, and never expect to read the other.

This past week I was on the edge of another discussion of standards with 
associated controlled vocabularies (in the K-12 domain) where a 
criticism was raised that it wasn't Creative Commons with an Attribution 
requirement, else how could you teach it?

That got me thinking about whether we shouldn't have already learned 
that lesson because the 'net largely runs on public RFCs, but wondered 
if I wasn't missing other examples inside our domain.


[CODE4LIB] Open, public standards v. pay per view standards and usage

2009-07-13 Thread Walter Lewis

Are there any blindingly obvious examples of instances where
a) a standards group produced a standard published by a body which 
charged for access to it

   b) a alternative standards groups produced a competing standard that 
was openly accessible
and the work of group a) was rendered totally irrelevant because most 
non-commercial work ignored it in favour of b).

My instinct is to quote the battle between OSI (ISO) and TCP/IP (IETF 
RFCs).  Does that strike others as appropriate?

Any examples closer to the library world?

Walter Lewis

Re: [CODE4LIB] HTML mark-up in MARC records

2009-06-21 Thread Walter Lewis

Doran, Michael D wrote:
Is anybody else embedding HTML mark-up code in MARC records [1]?  We're currently including an img tag in some MARC Holdings records in the 856z [2].   I'm inclined to think that HTML mark-up does not belong anywhere in MARC records, but am looking for other opinions (preferably with the reasoning behind the opinions), both pro and con.  
One of the things I found in some specific instances where I was 
generating Marc-like records on the fly from records that could have 
embedded HTML (i.e. MARBI marc community output) was that a variety of 
the targets that could read the data didn't know what to do with the 
tags and escaped them before passing them to the web client.  In 
short, consider the downstream partners who may try and render the HTML 
and what interfaces they are using.  Not everyone views the record via a 
browser ... :)

Walter Lewis

Re: [CODE4LIB] best OCR package?

2009-02-03 Thread Walter Lewis

Randy Stern wrote:
Abbyy Finereader and Nuance Omnipage are the two leading commercial 
OCR products. Both can achieve 98% + character accuracy on most 
book-like material scanned at 300 dpi.

At 07:37 AM 2/3/2009 -0500, Nicole Engard wrote:

I'm with Christian - I loved Abbyy FineReader when I used it at both
my previous libraries.  It's very accurate and it's affordable if
you're not using it for mass digitization :) but we never got the
server contract because like Christian said - it is quite expensive.
Abbyy's engine is actually quite affordable for mass digitization 
efforts as well.  Indeed, if you look closely at the outputs from the 
Internet Archive you'll see they use it extensively.  The desktop model 
requires bodies to handle the inputs and outputs; the server version can 
be built into a workflow.  Once you get past the time to set it up, the 
cost per page is *very* low ( from memory ~1 to 2 cents per page).

Walter Lewis

Re: [CODE4LIB] best OCR package?

2009-02-03 Thread Walter Lewis

Gabriel Farrell wrote:

On Tue, Feb 03, 2009 at 10:09:54AM -0500, Walter Lewis wrote:
If we had to correct it all: a) it would never get done and b) it would  
be better than some of the originals which are rife with typographic 

Hence the genius of Distributed Proofreaders [1] and reCAPTCHA [2].

[1] http://www.pgdp.net/c/
[2] http://recaptcha.net/learnmore.html
I have tremendous respect for the genius behind these projects, but the 
Victorian four page village newspapers have enough text for a your 
average government report.  Put four together and you get a three-decker 
novel. The folks in the Distributed Proofreaders rarely sign up for the 
labours of Hercules (and, according to my sources, he only hung in there 
for twelve tasks).

Then you have to deal with the fact that OCRing some of the microfilm 
I've seen is probably not statistically different from invoking a random 
token generator ...


Re: [CODE4LIB] best OCR package?

2009-02-03 Thread Walter Lewis

Karen Coyle wrote:
I know that 98% is impressive, but I always like to remember that with 
an average of 2000 characters per page that means 40 potential errors 
per book page. Just to give us some perspective on the level of 
cleanup that will be needed for books being digitized today.
The good news from the perspective of searching is that a reasonable 
percentage of those errors will affect terms that are either rarely used 
in searching or are repeated correctly in the vicinity. 

The bad news:  phrase search is compromised. Screen readers for the 
visually impaired are compromised. Relevance that depends on term 
clustered is compromised.

If we had to correct it all: a) it would never get done and b) it would 
be better than some of the originals which are rife with typographic errors.

 so still regrets the Swedish Chef OCR of most microfilm newspaper projects

Re: [CODE4LIB] Zotero under attack

2008-09-28 Thread Walter Lewis

Peter Murray wrote:
The version of EndNote I have (circa 2005) came with a couple dozen 
styles, and as of now Thomson Scientific has 3,500 up on their EndNote 
Styles website.
I had read the original claim as we export citations accepted at 3500 
journals (most of which they might have been able to accomplish with the 
couple dozen styles in question given the popularity of MLA, APA etc.).  
How much of the 3500 claim is  copy/paste as distinct from fresh 
intellectual effort?

Were they not claiming:
   a) we invented an internal data model that allows us to produce all 
these (different?) outputs

   b) you reverse-engineered our data model
   c) people can now export their citations from our data model in our 
proprietary software to your free software

   d) this is hurting our sales (or the tea leaves suggest it will)
   e) Stop. Send money ... lots.

Walter Lewis

Re: [CODE4LIB] marc records sample set

2008-05-09 Thread Walter Lewis

Bess Sadler wrote:

3. Are there features missing from the above list that would make
this more useful?

One of the things that Bill Moen showed at Access a couple of years ago
(Edmonton?) was what he and others were calling a radioactive Marc
record.  One that had no normal payload but, IIRC had a 245$a whose
value was 245$a etc.  As I recall, it was used to test processes where
you wanted to be sure that a specific field was mapped to a specific
index, or was showing in a particular Z39.50 profiles.



2008-04-03 Thread Walter Lewis

Sebastian Hammer wrote:

A true hacker has no need for these crude tools. He waits for cosmic
radiation to pummel the magnetic patterns on his drive into a pleasing
and functional sequence of bits.

Alas, having been doing this (along with my partners, the four
Yorkshiremen) since the Stone Age ...

We used to arrange pebbles in the middle of road into the relevant
patterns (we *dreamed* of being able to afford the wire for an abacus).
Passing carts would then help crunch the numbers.

   for whom graph paper, templates, pencils, 80 column punchcards and
IBM Assembler were formative experiences

Re: [CODE4LIB] [Web4lib] Library Staff Scheduler

2007-09-05 Thread Walter Lewis

Bigwood, David wrote:

Yes, some do move between branches.

... and a variable to keep in mind depending on the size of the system
(and the state of local traffic) ... time between branches.  When we
move staff between our two branches we have to make sure that coverage
is there for the period between one desk and another.  For example, for
a 12:00 shift start in branch 2, I have to leave branch 1 no later than
11:30 (and did anyone consider my lunch?)

Walter Lewis
glad not to be doing branch lunch coverages any more

Re: [CODE4LIB] [Web4lib] Library Staff Scheduler

2007-09-05 Thread Walter Lewis

Deb Bergeron wrote:

Lunch?  You get to have lunch?! ;-)

The absence of a lunch opportunity for the person covering lunches in
the smaller branch was, in fact, the great irony of the exercise. :)


Re: [CODE4LIB] not munging reply-to (was Re: [CODE4LIB] E-Resource Access Management Services)

2007-03-30 Thread Walter Lewis

Ed Summers wrote:

There are strong religious arguments on both sides of this issue...and
they are both equally boring.


who has managed to screw up no matter what the list settings

Re: [CODE4LIB] auto-anthologizing

2007-02-15 Thread Walter Lewis

Laura Smart wrote:

 At 05:17 AM 2/15/2007, you wrote:

  (Does your feedreader lose its flavor on your next post overnight?)
 If your readers say don't chew on it, but you edit it in spite?
Or if the comments say you're wrong, but you edit so you're right?

Thank you all for making me actually spill my morning coffee.
Now only to film a barbershop quartet of librarians singing it via

... just needs two more lines for the chorus, plus three verses :)


[CODE4LIB] ISBD punctuation was [CODE4LIB] Getting data from Voyager into XML?

2007-01-19 Thread Walter Lewis

Erik Hatcher wrote:

I am, however, skeptical of a purely MARC - XSLT - Solr solution.
The MARC data I've seen requires some basic cleanup (removing dots at
the end of subjects, normalizing dates, etc) in order to be useful as
facets.  While XSLT is powerful, this type of data manipulation is
better (IMO) done with scripting languages that allow for easy
tweaking in a succinct way.

Perhaps what Erik's put his finger on here is as good an excuse as any
to raise the Death To ISBD Punctuation banner one more time.  Some
60s/70s field termination punctuation rules are at the heart of most of
the crud you're trying to scrape off these records.  If ever there was a
set of encoding rules that were more misguided, I've been fortunate not
to encounter them.


Re: [CODE4LIB] Server names at libraries

2006-10-27 Thread Walter Lewis

David J. Fiander wrote:

Naming computers is always fun.  My main computer at home is always
Golem, and if I ever had had the power to name a series of
computers, I was going to name them after famous Canadian maritime
disasters (Erebus and Terror were going to be the first two).

My development machines have always been named after ships that have
been named after something else, just to supply both a theme and an
inside joke. A sad pathetic life perhaps, but there it is.  So I'm
writing this on Bohemian, while testing some code on Corinthian, and
pulling email from Assiniboia.

That said, I've always *avoided* ship names associated with major
collisions and fires.  So I'll leave Noronic, Hammonic, and Bavarian for
David.  :)

RFC 1178 (http://www.apps.ietf.org/rfc/rfc1178.html) has some good
dos and don'ts for naming a computer, and is a pretty fun read too.

That *has* to be the most interesting RFC it has ever been my pleasure
to read


Re: [CODE4LIB] Server names at libraries

2006-10-27 Thread Walter Lewis

Richmond,Ian wrote:

  What about naming the server so that users would know what it did from
the name?  We used to have a library web server named libweb, which I
always liked, as it sort of made sense to people.

That's what we do with DNS.  Our internal names are almost never exposed
to the public.  However, the object is to separate roles (www.ourdomain,
mail.ourdomain, news.ourdomain etc.) from the underlying machines and
their names.  I have machines that have up to 20 dns identities ... but
then we host multiple sites in our community.