Re: [CODE4LIB] Ruby on Windows

2013-10-02 Thread Joe Atzberger
To summarize options:

   - Linux VM in VirtualBox (ubuntu, fedora, centOS, etc.)
   - Groovy (dynamic JVM language) is an excellent cross-platform option,
   one I use daily.  Especially if you are coming from a Java background.
The Groovy web framework comparable to rails is Grails.

Packaging in Ruby is one of the worst downsides of an otherwise compelling
language, and getting it onto Windows is more than I would bother with.

If you are doing Groovy, I'd still develop it on linux (for ease of
integration with various documentation and tutorials, and of course
personal preference).

--Joe

On Tue, Oct 1, 2013 at 5:13 PM, Joshua Welker wel...@ucmo.edu wrote:

 I'm using Windows 7 x64 SP1. I am using the most recent RubyInstaller
 (2.0.0-p247 x64) and DevKit (DevKit-mingw64-64-4.7.2-2013022-1432-sfx).

 That's disappointing to hear that most folks use Ruby exclusively in *nix
 environments. That really limits its utility for me. I am trying Ruby
 because dealing with HTTP in Java is a huge pain, and I was having
 difficulties setting up a Python environment in Windows, too (go figure).

 Josh Welker


Re: [CODE4LIB] more on MARC char encoding

2012-04-26 Thread Joe Atzberger
All these should end up in the hypothetical if not actual MARCthulhu
repository.  Anybody heard from Simon whether that is still happening or
not?  If he doesn't have anything, we should just start a fresh pile on
github.

--joe

On Fri, Apr 20, 2012 at 12:01 PM, Doran, Michael D do...@uta.edu wrote:

 Hi Sophie,

  To better understand the character encoding issue, can anybody
  point me to some resources or list like UTF8 encoded data but
  not in the MARC8 character set?

 That question doesn't lend itself to an easy answer.  The full MARC-8
 repertoire (when you include all of the alternate character sets) has over
 16,000 characters.  The latest version of Unicode consists of a repertoire
 of more than 110,000 characters.  So a list of UTF8 encoded data not in the
 MARC8 character set, would be a pretty long list.

 For a more *general* understanding of character encoding issues, I would
 recommend the following resources:

 For a quick library-centric overview, Coded Character Sets: A Technical
 Primer for Librarians web page [1].  Included is a page on Resources on
 the Web, which has an emphasis on library automation and the internet
 environment [2].

 For a good explanation about how character sets work in relational
 databases (as part of the more general topic of globalization/I18n), the
 Oracle Globalization Support Guide [3].

 For all the ins and outs of Unicode, the book Unicode Explained by Jukka
 Korpela [4].

 -- Michael

 [1] http://rocky.uta.edu/doran/charsets/

 [2] http://rocky.uta.edu/doran/charsets/resources.html

 [3] http://docs.oracle.com/cd/B19306_01/server.102/b14225/toc.htm

 [4] http://www.amazon.com/gp/product/059610121X/

 # Michael Doran, Systems Librarian
 # University of Texas at Arlington
 # 817-272-5326 office
 # 817-688-1926 mobile
 # do...@uta.edu
 # http://rocky.uta.edu/doran/



  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Deng, Sai
  Sent: Friday, April 20, 2012 8:55 AM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] more on MARC char encoding
 
  If a canned cleaner can be added in MarcEdit to deal with smart
  quotes/values, that will be great! Besides the smart quotes, please
  consider other special characters including Chemistry and mathematics
  symbols (these are different types of special characters, right?) To
  better understand the character encoding issue, can anybody point me to
  some resources or list like UTF8 encoded data but not in the MARC8
  character set? Thanks a lot.
  Sophie
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Jonathan Rochkind
  Sent: Thursday, April 19, 2012 2:14 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] more on MARC char encoding
 
  Ah, thanks Terry.
 
  That canned cleaner in MarcEdit sounds potentially useful -- I'm in a
  continuing battle to keep the character encoding in our local marc corpus
  clean.
 
  (The real blame here is on cataloger interfaces that let catalogers save
  data that are illegal bytes for the character set it's being saved as.
  And/or display the data back to the cataloger using a translation that
  lets them show up as expected even though they are _wrong_ for the
  character set being saved as.  Connexion is theoretically the rolls royce
  of cataloger interfaces, does it do this? Gosh I hope not.)
 
  On 4/19/2012 2:20 PM, Reese, Terry wrote:
   Actually -- the issue isn't one of MARC8 versus UTF8 (since this data
  is being harvested from DSpace and is UTF8 encoded).  It's actually an
  issue with user entered data -- specifically, smart quotes and the like.
  These values obviously are not in the MARC8 characterset and cause many
  who transform user entered data (which tend to be used by default on
  Windows) from XML to MARC.  If you are sticking with a strickly UTF8
  based system, there generally are not issues because these are valid
  characters.  If you move them into a system where the data needs to be
  represented in MARC -- then you have more problems.
  
   We do a lot of harvesting, and because of that, we run into these types
  of issues moving data that is in UTF8, but has characters not represented
  in MARC8, from into Connexion and having some of that data flattened.
  Given the wide range of data not in the MARC8 set that can show up in
  UTF8, it's not a surprise that this would happen.  My guess is that you
  could add a template to your XSLT translation that attempted to filter
  the most common forms of these smart quotes/values and replace them
  with the more standard values.  Likewise, if there was a great enough
  need, I could provide a canned cleaner in MarcEdit that could fix many of
  the most common varieties of these smart quotes/values.
  
   --TR
  
   -Original Message-
   From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
   Of Jonathan Rochkind
   Sent: Thursday, April 19, 2012 11:13 

Re: [CODE4LIB] 2012 preconference proposals wanted!

2012-01-26 Thread Joe Atzberger
I can chip in here, possibly reprising my role from last year's git
session.

If somebody else like mbklein wants to do fundamentals, I wouldn't might
fleshing out the eco-system of git tools, including github, gitweb, gitosis
and in particular, the necessary evil that is git-svn.

--joe

On Fri, Jan 20, 2012 at 1:58 PM, Rob Casson rob.cas...@gmail.com wrote:

 as the guy who suggested someone do this (and now, sadly, can't make
 it to seattle), thanks for doing this.  beers on me in 2013,
 rc

 On Fri, Jan 20, 2012 at 1:54 PM, Cary Gordon listu...@chillco.com wrote:
  Excellent!
 
  Let me know how I can help.
 
  Cary
 
  On Thu, Jan 19, 2012 at 10:52 PM, Michael B. Klein mbkl...@gmail.com
 wrote:
  Anjanette brought this up on the conference mailing list, and asked for
 a
  new facilitator. I volunteered. I was going to throw together a little
  intro and some starting points, and then throw it open to the room to
 share
  information and ask questions. But I think your name was on the board
  first, Cary, so if you'd like to facilitate, I'm happy to play either
 role.
 
  Michael



Re: [CODE4LIB] SV: [CODE4LIB] Plea for help from Horowhenua Library Trust to Koha Community

2011-11-28 Thread Joe Atzberger
The key thing here, if PTFS actually means what they say, is that they
should assign the trademark APPLICATION over to HLT.  Otherwise, the
posture is really just trying to convince you not to contest their
receiving the trademark, after which they can do wtf with it.

This is a big deal to anybody that contributes to an OSS project (as I did
with Koha for several years, at LibLime and elsewhere).  Imagine a company
like Rackspace trying to trademark Apache for some webserver software
they happen to run and sell services on.  You know, a project that entirely
predates their involvement, has hundreds of previous committers, and has
actually already been called Apache all this time.

Koha predates LibLime.  Its availability and the technical experience of
staff at Athens County PL with Koha are the reasons why LibLime could even
exist.  It wasn't called something else, it wasn't a whitelabel platform or
an unnamed research project, it was Koha.  LibLime contributed massively to
the codebase under GPL... to Koha.

I don't see this in the framing a lot of the stories are giving it, namely
Large Culturally Insensitive U.S. Corporation vs. Small Friendly NZ
Library.  I see this a fundamental OSS governance issue.  If you can't
keep this kind of appropriation from happening here, then we're all just
one patent/copyright/trademark squatter/troll away from being hijacked.

How is it we can't just cite prior art and be done with it?

--Joe Atzberger


Re: [CODE4LIB] Hotel registration - This was a test, right?

2011-11-16 Thread Joe Atzberger
The site you are trying to access does not exist. Please contact the event
organizer to report this problem.


Re: [CODE4LIB] What's the descriptive technical terminology?... pdf image of a page. pdf format used with cut paste.

2011-04-28 Thread Joe Atzberger
I would just say image-based or text-based.  Sorry if you wanted something
more hifalutin.

There is another level of granularity though, inasmuch as you can publish a
text-based PDF that attempts to prevent copy/paste.  Like websites with
their javascript hacks, it isn't really secure, it just instructs Acrobat
Reader not to enable that feature.

--Joe

On Thu, Apr 28, 2011 at 12:21 PM, don warner saklad don.sak...@gmail.comwrote:

 What's the descriptive technical terminology information professionals
 use to distinguish the kind of pdf that can't be used with cut paste,
 an image of the page of an article versus the format in pdf where it's
 not an image of a page and can be used with the cut paste
 mechanism?... What is the first example called properly in the
 information technology industry technical terms?... What's the second
 example called in the descriptive technical language?



Re: [CODE4LIB] AquaBrowser Libraries Group

2009-10-26 Thread Joe Atzberger
I'm fairly confident there is not, just that the new list intends to (self-)
select just licensed users.
--Joe

On Fri, Oct 23, 2009 at 3:23 PM, Cloutman, David
dclout...@co.marin.ca.uswrote:

 Interesting. Our catalog consortium just bought Aquabrowser. Is there
 some sort of NDA that you know of that would limit the discussion to
 private forums? I hadn't heard of such a thing, but then maybe no one
 thought to tell me.



 ---
 David Cloutman dclout...@co.marin.ca.us
 Electronic Services Librarian
 Marin County Free Library

 -Original Message-
 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Gabriel Farrell
 Sent: Thursday, October 22, 2009 3:02 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] AquaBrowser Libraries Group


 While the Interesting difference... bit may be read as snarky, I
 appreciated Jeffrey's post for pointing out that most discussions about
 AquaBrowser can't take place on this list due to its lack of membership
 restrictions.


 On Thu, Oct 22, 2009 at 10:45:24AM -0400, Edward M. Corrado wrote:
  I don't see this as an interesting difference at all. Almost all
  [larger] vendor-supplied products in the library world have their
  own discussion lists that are limited to people that use/license
  their products. We even see this with Open Source products such as
  Koha. Although I do not use AquaBrowser, unlike almost all other
  library specific-software of this magnitude I understand that
  AquaBrowser does not have a user group (formal or informal). There
  currently is very few ways (no way?) for users of this product to
  converse with each other and share ideas.
 
  There are numerous reasons for wanting to share information on a
  closed list that can range from not wanting to spam a larger
  community with a how do activate a widget in product A to asking
  questions/sharing information that for whatever reason you don't
  want to or can't share with the whole world (e.g. non-disclosure
  agreements, public relations concerns, privacy concerns, not wanting
  your name in open archives attached to something, etc.).  In fact,
  in some cases you may not even want the vendor on the list the way
  some Voyager systems administrators created a list that excluded
  Endeavor (and now Ex Libris) and non-systems people at Voyager
  sites. This made people feel much more comfortable asking questions
  that maybe they would otherwise be embarrassed or reluctant to ask.
 
  I applaud Kathryn for taking the initiative to organize the
  AquaBrowser community by creating the AquaBrowser Libraries Group.
  From what I understand from people that use the product this is
  something that is overdue for the community.
 
  What the library technology world needs is more people like Kathryn
  that try to build community to help each other with whatever
  software product they are using. Sure, in a perfect world maybe
  everything would be completely Open but that is not reality. People
  that take initiative should be praised. They should not be met with
  snarky comments.
 
  Edward
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU
  mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Barnett, Jeffrey
  Sent: Thursday, October 22, 2009 9:05 AM
  To: CODE4LIB@LISTSERV.ND.EDU mailto:CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] AquaBrowser Libraries Group
 
  Good point Ed, but I think by the phrase Licensed sites only the
  intent of the AquaBrowser discussion _is_ to exclude open source.
  Interesting difference...
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU
  mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
  Ed Summers
  Sent: Wednesday, October 21, 2009 9:19 PM
  To: CODE4LIB@LISTSERV.ND.EDU mailto:CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] AquaBrowser Libraries Group
 
  You should also feel free to discuss AquaBrowser on here too ... the
  code4lib discussion isn't limited to opensource software.
 
  //Ed
  - Hide quoted text -
 
  On Wed, Oct 21, 2009 at 4:32 PM, Kathryn Frederick
  kfred...@skidmore.edu mailto:kfred...@skidmore.edu wrote:
   Please excuse cross-posting.
  
   I've set up an AquaBrowser Google Group to share tips and post
   questions. If your library uses AquaBrowser, please consider
 joining.
   This group is restricted, email me at kfred...@skidmore.edu
  mailto:kfred...@skidmore.edu and I'll
   send you an invite.
  
   Licensed sites only, please.
  
   Thanks,
   Kathryn



Re: [CODE4LIB] find in page, diacritics, etc

2009-08-28 Thread Joe Atzberger
Something like the jquery highlight function combined with this kind of
mapping:

http://stackoverflow.com/questions/863800/replacing-diacritics-in-javascript

If you don't mind, you can speed things up by forcing the comparison sets to
be in one case or the other.
--Joe

On Fri, Aug 28, 2009 at 12:17 PM, Tim Shearer sh...@ils.unc.edu wrote:

 Hi Folks,

 Looking for help/perspectives.

 Anyone got any clever solutions for allowing folks to find a word with
 diacritics in a rendered web page regardless of whether or not the user
 tries with or without diacritics.

 In indexes this is usually solved by indexing the word with and without, so
 the user gets what they want regardless of how they search.

 Thanks in advance for any ideas/enlightenment,
 Tim



Re: [CODE4LIB] digital storage

2009-08-27 Thread Joe Atzberger
On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado ecorr...@ecorrado.uswrote:

 Nate Vack wrote:

 On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu
 wrote:


 $213,360 over 3 years


  If you're ONLY looking at storage costs, SATA drives in enterprise RAID
 systems range from about $1.00/GB to about $1.25/GB for online storage.


 Yeah -- but if you're looking only at storage costs, you'll have an
 inaccurate estimate of your costs. You've got power, cooling, sysadmin
 time, and replacements for failed disks. If you want an
 apples-to-apples comparison, you'll want an offsite mirror, as well.

 I'm not saying S3 is always cost-effective -- but in our experience,
 the costs of the disks themselves is dwarfed by the costs of the
 related infrastructure.

  I agree that the cost of storage is only one factor. I have to wonder
 though, how much more staff time do you need for local storage than cloud
 storage? I don't know the answer but I'm not sure it is much more than
 setting up S3 storage, especially if you have a good partnership with your
 storage vendor.


Support relationships, especially regarding storage are very costly.  When I
worked at a midsize datacenter, we implemented a backup solution with
STORServer and tivoli.  Both hardware and software were considerably
costly.  Initial and ongoing support, while indispensable was basically as
much as the cost of the hardware every few years.


 With cloud storage you still need other backups and mirrors, so I don't see
 the off-site mirror as an argument in favor of the cloud. You should have
 that redundancy either way.


You have the original, and the copy, wherever it is.  So you can build rack
elsewhere (and reintroduce power, cooling, security and bandwidth costs), or
get a tape rotation scheme in place, or whatever, but a cloud-based backup
is already offsite, whereas an in-house tape library (like our STORServer)
still requires a staffer to populate the lockbox to be picked up (we used
Iron Mountain, then later Cintas).


 Yes, maybe you save on staff time patching software on your storage array,
 but that is not a significant amount of time - esp. since you are still
 going to have some local storage, and there isn't much difference in staff
 time in doing 2 TB vs. 20 TB.


There's a real difference.  I can get 2 TB in a single HDD, for example this
one for $200 at NewEgg:
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413

Any high school kid can install that.  20 TB requires some kind of
additional structure and additional expertise.

You may some time on the initial configuration, but you still need to
 configure cloud storage. Is cloud storage that much easier/less time
 consuming to configure than an iSCSI device? Replacement for disks would be
 covered under your warranty or support contract (at least I would hope you
 would have one).


Warranties expire and force you into ill-timed, hardly-afforded and
dangerous-to-your-data upgrades.  Sorta like some ILS systems with which we
are all familiar.  The cloud doesn't necessarily stay the same, but the part
you care about (data in, data out) does.


 The power and cooling can be a savings, but in many cases the library or
 individual departments don't pay for electricity, so while *someone* pays
 the cost, it might not be the individual department. Cooling and electricity
 costs are an actually a great argument for tape for large-scale storage.
 Tape might seem old fashioned, but in many applications it by far offers the
 best value of long term storage per GB.


It's true, tape is still an worthwhile option. Alternatives like optical or
magneto-optical media just have not kept up.

Again, I'm not totally against the cloud and there are some things I think
 it could be very useful for, but the cloud doesn't make up for the lack of
 (or just bad) planning.


Yeah, there's no system good enough to compensate for bad planning and
management.
--Joe


Re: [CODE4LIB] Long way to be a good coder in library

2009-07-22 Thread Joe Atzberger
It's about time to make this thread a wiki post.
--Joe


Re: [CODE4LIB] tools for massaging metadata

2009-07-09 Thread Joe Atzberger
On Thu, Jul 9, 2009 at 10:15 AM, Avila, Regina L. regina.av...@nist.govwrote:

 Can anybody share some good tools for massaging metadata? For anything from
 file renaming to cleaning ASCII characters to various formulas?  I know
 Excel does a lot of things but I'm looking for other useful software to
 consider. I'm familiar with Parserat, Notepad++, A Better File Rename and a
 few others.  Any other gems I should know?


Since I work mostly with MARC records:

   - MarcEdit
   - MARC::Record, MARC::Lint
   - and the more general: sed, egrep, etc.

A validator for your given data format is usually important to keeping your
data compliant.

--Joe


Re: [CODE4LIB] PHP/MySQL: sanitizing file uploads to DB

2009-06-05 Thread Joe Atzberger
Sounds like you might need something like a SQL version of HTML::Scrubber.

The more important thing is to use prepared statements with placeholders, so
that you can't get server execution injected on.  Then worry about javacript
or html scrubbing.

-- 
Joe Atzberger
LibLime - Open Source Library Solutions

On Fri, Jun 5, 2009 at 10:30 AM, Kenneth R. Irwin kir...@wittenberg.eduwrote:

 Hi folks,

 Can someone point me to some good information/how-to-guide/etc for
 sanitizing files uploaded to a MySQL database through a web interface? (This
 would be something much like the Insert data from a textfile into table
 function in phpMyAdmin.) I want to make sure there aren't any nasty queries
 inserted into the tab-delimited data.

 I.e., don't let this happen to you: http://xkcd.com/327/

 Is this whole-file sanitization any different than the sort of thing you
 might use for individual pieces of data? E.g.
 http://www.denhamcoote.com/php-howto-sanitize-database-inputs

 Any advice would be appreciated.

 Thanks!
 Ken



Re: [CODE4LIB] A Book Grab by Google

2009-05-19 Thread Joe Atzberger
 BTW, we are sponsoring a mini-symposium on the topic of mass digitization
 here at Notre Dame, tomorrow:

  http://www.library.nd.edu/symposium/


Nice timing.

--joe


Re: [CODE4LIB] Curious about Cell Phone Barcode Scanning Apps

2009-05-08 Thread Joe Atzberger
Google provided the barcode-recognition line-interpolation software as open
source for Android developers to build on. That explains why I have about 4
barcode-scanning apps on the G1.

Note that most common cellphone camera's haven't advanced enough to get
reliable resolution for barcodes, in particular the up-close macro-like
distances you would use a scanner at.  My old nokia, despite the 3 MP
camera, couldn't get focus up close.

In a year or two that should be different for the currently available
models.

--Joe

On Fri, May 8, 2009 at 10:39 AM, Matt Amory matt.am...@gmail.com wrote:

 I'm interested in some advice on building an app to pickup barcode data
 through a cell phone camera and return OPAC/Library Thing/WorldCat etc.
 results to a mobile interface.
 I know that Android has a UPC barcode reader linked to a shopping app, and
 I'm wondering if this can be used or repurposed, or if there's a better
 place to begin.

 Thanks!



Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Joe Atzberger
On Fri, May 1, 2009 at 5:39 PM, Mike Taylor m...@indexdata.com wrote:


 If you want real 300 dpi images, at anything like the quality you get
 from a flatbed scanner, then you're going to need cameras much more
 expensive than $100.


Or just wait, say, about 3 years.


Re: [CODE4LIB] resolution and identification (was Re: [CODE4LIB] registering info: uris?)

2009-04-14 Thread Joe Atzberger
The User Agent is understood to be a typical browser, or other piece of
software, like wget, curl, etc.  It's the thing implementing the client side
of the specs.  I don't think you are operating as a user agent here as
much as you are a server application.  That is, assuming I have any idea
what you're actually doing.

--Joe

On Tue, Apr 14, 2009 at 11:27 AM, Jonathan Rochkind rochk...@jhu.eduwrote:

 Am I not an agent making use of a URI who is attempting to infer properties
 from it? Like that it represents a SuDoc, and in particular what that SuDoc
 is?

 If this kind of talmudic parsing of the TAG reccommendations to figure out
 what they _really_ mean is neccesary, I stand by my statement that the
 environment those TAG documents are encouraging is a confusing one.

 Jonathan


 Houghton,Andrew wrote:

 From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
 Jonathan Rochkind
 Sent: Tuesday, April 14, 2009 10:21 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] resolution and identification (was Re:
 [CODE4LIB] registering info: uris?)

 Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-
 17.html

 They suggest: URI opacity'Agents making use of URIs SHOULD NOT
 attempt to infer properties of the referenced resource.'

 I understand why that makes sense in theory, but it's entirely
 impractical for me, as I discovered with the SuDoc experiment (which
 turned out to be a useful experiment at least in understanding my own
 requirements).  If I get a URI representing (eg) a Sudoc (or an ISSN,
 or an LCCN), I need to be able to tell from the URI alone that it IS a
 Sudoc, AND I need to be able to extract the actual SuDoc identifier
 from it.  That completely violates their Opacity requirement, but it's
 entirely infeasible to require me to make an individual HTTP request
 for every URI I find, to figure out what it IS.



 Jonathan, you need to take URI opacity in context.  The document is
 correct
 in suggesting that user agents should not attempt to infer properties of
 the referenced resource.  The Architecture of the Web is also clear on
 this
 point and includes an example.  Just because a resource URI ends in .html
 does not mean that HTML will be the representation being returned.  The
 user agent is inferring a property by looking at the end of the URI to see
 if it ends in .html, e.g., that the Web Document will be returning HTML.
  If you really want to know for sure you need to dereference it with a HEAD
 request.

 Now having said that, URI opacity applies to user agents dealing with
 *any*
 URIs that they come across in the wild.  They should not try to infer any
 semantics from the URI itself.  However, this doesn't mean that the minter
 of a URI cannot create a policy decision for a group of URIs under their
 control that contain semantics.  In your example, you made a policy
 decision about the URIs you were minting for SUDOCs such that the actual
 SUDOC identifier would appear someplace in the URI.  This is perfectly
 fine and is the basis for REST URIs, but understand you created a specific
 policy statement for those URIs, and if a user agent is aware of your
 policy
 statements about the URIs you mint, then they can infer semantics from
 the URIs you minted.

 Does that break URI opacity from a user agents perspective?  No.  It just
 means that those user agents who know about your policy can infer
 semantics
 from your URIs and those that don't should not infer any semantics because
 they don't know what the policies are, e.g., you could be returning PDF
 representations when the URI ends in .html, if that was your policy, and
 the only way for a user agent to know that is to dereference the URI with
 either HEAD or GET when they don't know what the policies are.


 Andy.






Re: [CODE4LIB] Something completely different

2009-04-09 Thread Joe Atzberger
On Thu, Apr 9, 2009 at 10:26 AM, Mike Taylor m...@indexdata.com wrote:

 ... anyway, all of this is far, far away from the point.  MARC is old
 and ugly yes; but then so am I, and I get the job done, just like
 MARC.  That format is responsible for about 0.2% of our difficulties,
 and replacing it would make essentially no difference to anything that
 we actually care about.


The *encoding* however is responsible for about 20% of my difficulties.
MARC-8 should die...
--Joe


Re: [CODE4LIB] extra computers

2009-03-17 Thread Joe Atzberger
Check for a local branch of freegeek for rehabilitation and environmental
disposal:

http://www.freegeek.org/

Columbus has one, so South Bend might too.

--Joe

On Tue, Mar 17, 2009 at 1:28 PM, Jim Tuttle j...@braggtown.com wrote:

 Eric Lease Morgan wrote:
  How do y'all suggest I put to good use the increasing number of extra
  computers I have lying around my house?
 
  You would think I was starting a computer museum with the number of
  decommissioned computers I have at home. A few Macintoshes and a couple
 of
  Intel-based machines. (Not to mention the TI-99A, or whatever.) I don't
  really need backup. I don't really need a Web server. Maybe I could use
  these computers as some sort of CPU Farm to do some sort of interesting
  computing.
 
  Any suggestions?
 


 One possibility would be to find somewhere to donate them.  Many cities
 have non-profits that recycle computers to low-income families and
 provide training.

 Jim

 --
 *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
 Jim Tuttle
 http://braggtown.com





Re: [CODE4LIB] Free cover images?

2009-03-16 Thread Joe Atzberger
This comes up in Koha development all the time.  We support a sizable number
of libraries using Amazon images (and other content) in Koha.  Every now and
again a client reads that same clause and alerts us to this threatening
legal snafu.  Josh would have some protracted contact with Amazon's US legal
department and in the end, they don't object, implement blocks or send Cease
and Desist letters.

In a legalistic frame of mind, I would say that the principal purpose
clause needs some interpretation to make any sense.  For one, it targets
user intent rather than effect.  Then it depends on the scope of their
Application.

Then there are simple questions like:

   - do they expect my website to be more interesting and more informative
   with AWS?
   - do they expect me as a website owner to be *more* concerned with
   driving traffic to Amazon than my own content and traffic?

The clause says yes, but can't actually mean it.  And, in fact, it isn't
enough to drive traffic to Amazon... it has to be driving traffic to the
Amazon Website *and* driving sales.  This gives you a clue as to their real
intent: they want to prohibit commercial competition from jacking their
content.  And they don't want you to think you deserve a bigger cut, or
complain that no, I only used AWS because I wanted to BAN that book.
Therefore you must forfeit sales from people linking from my site.

So you can still fit inside their overwrought language if you define the
jacket-images-from-Amazon part to be it's own Application and honestly
include the link back to them somewhere, and understand that they intend to
take linkers and try to sell them stuff.  That's the purpose of the link.

In our case, it isn't a big threat.  We know they know they get traffic from
us, including people who specifically want a given title that is currently
unavailable at their local library, i.e. high-value traffic.  If they want
to throw the switch and implement a block, clients will start driving
traffic to Google (or fill-in-the-blank) instead.

--Joe,
LibLime

On Mon, Mar 16, 2009 at 4:03 PM, Kyle Banerjee kyle.baner...@gmail.comwrote:

 Yah, but same could be said for Amazon. From
 http://aws.amazon.com/agreement/

 5.1.3. You are not permitted to use Amazon Associates Web Service with
 any Application or for any use that does not have, as its principal
 purpose, driving traffic to the Amazon Website and driving sales of
 products and services on the Amazon Website.

 Maybe libraries are under the radar, and maybe Amazon doesn't care,
 but getting addicted to this stuff is not without risk. If the load
 ever became something they cared about, they could turn it off in a
 snap.

 kyle

 On Mon, Mar 16, 2009 at 12:53 PM, Jonathan Rochkind rochk...@jhu.edu
 wrote:
  You can get cover images from worldcat? How?  I'm pretty sure the
 worldcat
  ToS specifically disallow you from re-using those covers, even if you are
  managing to get them via machine access somehow.
 
  Lynch,Katherine wrote:
 
  Going along with Jonathan Rochkind, Amazon does a good job of supplying
  some movie images.  Also in general, WorldCat, if that's an option to
  you.  For a good example of wealth/response time, check out Gabe's video
  search:
  http://www.library.drexel.edu/video/search
 
  ---
  Katherine Lynch
  Library Webmaster
  Drexel University Libraries
  215.895.1344 (p)
  215.895.2070 (f)
 
 
  -Original Message-
  From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
  Edward M. Corrado
  Sent: Monday, March 16, 2009 2:38 PM
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: [CODE4LIB] Free cover images?
 
  Hello all,
 
  We are reevaluating our source of cover images. At this point I have
  identified four possible sources of free images:
 
  1. Amazon
  2. Google Books
  3. LibraryThing
  4. OpenLibrary
 
  I know that their is some question if the Amazon and Google books images
 
  will allow this (although I've also yet to hear Amazon or Google telling
 
  libraries that use their Web services for this to cease and desist).
  However, besides that issue, has anyone noticed any technical problems
 with
  any of these four? I'm especially concerned about slow and/or
 non-consistent
  performance.
 
  Edward
 
 
 



 --
 --
 Kyle Banerjee
 Digital Services Program Manager
 Orbis Cascade Alliance
 baner...@uoregon.edu / 503.999.9787



Re: [CODE4LIB] Free cover images?

2009-03-16 Thread Joe Atzberger
The bizarre part of it is that they insist *Amazon's purpose* become the
primary purpose of *your* Application.  This is weird if you think of an
entire ILS as the Application, since nobody could reasonably argue the
overall purpose is to get Amazon more hits and sales.

It requires the terminological gymnastics I just described to control the
scope of the Application (and therefore of their Terms).  Other than that, I
think everybody here should be OK w/ the link back condition, tastefully
implemented.

--Joe

On Mon, Mar 16, 2009 at 4:50 PM, Nate Vack njv...@wisc.edu wrote:

 On Mon, Mar 16, 2009 at 3:30 PM, Jonathan Rochkind rochk...@jhu.edu
 wrote:

  However, my understanding is that Worldcat forbids any use of those cover
  images _at all_.  This is much more clear cut, and OCLC is much more
 likely
  to care, then Amazon's more bizarre restrictions as to purpose.

 How is Amazon's restriction bizarre? As far as I can read, they're
 saying hey if you're using our data, we ask that you drive traffic to
 us, OK? That's totally reasonable; they, you know, sell books for a
 living, and their API services aren't free to support.

 If you're using Amazon's cover images, you should provide a way for
 Amazon to capitalize on that usage. Even if they don't cut you off
 (because they don't catch you or don't care), linking to them is still
 the morally right thing to do.

 Cheers,
 -Nate



Re: [CODE4LIB] more comments on award idea

2009-03-11 Thread Joe Atzberger
I appreciate Jonathan sounding out the arguments against the proposed form
of the award, and offering some alternatives.  In short, I think I agree
with him.

I was at Karen's OSS Metrics breakout session, and had a lot of
reservations about the output of the session, even though the discussion
there was interesting and well-intentioned.  It comes down to the two
decision-making processes: the internal c4l one for making the award and the
external one(s) being influenced by it.

We were listing criteria one might use to evaluate a given project.  And it
was a good enough list of issues, but I kept thinking that it was bound to
fail if it were a scorecard to be used *comparatively* between otherwise
heterogeneous projects on different platforms, in different environments,
with different purposes, etc.  I wasn't even confident of our ability to
review one individual criteria like security for a given project, let
alone amongst all projects.  For the amount of work and expertise it would
take to evaluate that honestly, we could be contributing *fixes* to even the
lesser projects.

But I'll put aside the question of how accurately we could pick amongst
totally diverse projects.  Pretend we could.  I don't think we could
communicate the objective context to the external decision makers who would
consider themselves informed by the mere fact of the award.

The Journal featuring a project has none of these problems, because it can
maintain context.  Like Is this project useful to archivists in major
institutions? or Is this OSS project a good alternative to a different
proprietary software X?

I also like the role of code4lib being more of a contributer and less of an
arbiter.  If the goal is to benefit the cool projects, keep the money, show
me the code.

--Joe Atzberger,
LibLime

On Wed, Mar 11, 2009 at 12:46 PM, Jonathan Rochkind rochk...@jhu.eduwrote:

 As I think about the award idea more, I still don't really like it. (Sorry
 Eric!).

 Some comments at
 http://bibwild.wordpress.com/2009/03/09/why-i-dont-like-the-code4lib-code-award-idea/

 With a shorter version below (thanks Jodi).

 The award will inevitably be seen as an endorsement of the awarded project
 by ‘Code4Lib.’ While some supporters say this is not the intention, I’ve
 also seen supporters say the reason they want the Code4Lib name on it is so
 the award will have more prestige. To me, this implies that an implied
 endorsement in fact is part of the idea: What else would this prestige be
 for? But whether it’s intentional or not, it’s inevitable.

 The Code4Lib community has indeed garnered a fair amount of prestige
 lately, including by people who don’t really understand the informal and
 non-official nature of Code4Lib. I’ve seen Code4Lib erroneously referred to
 as an ‘organization’ several times. Much of this audience will see such an
 award as an endorsement of the project awarded, by the prestigious
 ‘Code4Lib’.

 But I don’t think Code4Lib actually has the capacity to accurately and
 useful determine value of an open source project.

 Libraries need to learn how to evaluate open source projects on their own,
 for their own circumstances and needs. Libraries, always on the look-out for
 shortcuts, are going to be really tempted to use a Code4Lib award as a
 shortcut to their own investigation. If it’s awarded by Code4Lib, it must be
 good. I worry about anything that discourages libraries from the hard work
 of developing their own capacity to evaluate projects; and I also worry
 about such an implied endorsement actually steering them wrong because I
 don’t think we have the capacity to reliably make such universally
 applicable evaluations as a community. Sure, the award won’t be intended as
 such, but it will be read as such.

 I would actually love to see a regular “notable project review” feature in
 the Code4Lib Journal, perhaps in every issue. This could cover only articles
 that the reviewers thought were exceptionally good, or it could cover any
 project of note.

 And reviews would have particular reviewer’s bylines attached, making it
 clear who was doing the evaluation, and discouraging the reader from
 thinking it’s the “Code4Lib community”, which isn’t capable of speaking with
 one voice anyway (nor do we desire it to).

 If the goal of the idea is to inject some money into library-domain open
 source software development, than rather than an award with compenstaion, I
 think the money could more effectively be spent funding an internship or
 some kind.

 Perhaps something like Google Summer of Code. Give a stipend to some
 library student (or currently un- or under-employed Code4Libber, but I like
 the idea of getting library students involved as bonus) to work on a
 Code4Lib community project. Perhaps the community could vote on which
 project(s) were eligible for such an internship, and then people could apply
 expressing their interests, and a smaller committee would actually match an
 intern with a project.



Re: [CODE4LIB] perl wrapper for yaz-marcdump

2009-01-30 Thread Joe Atzberger
On Fri, Jan 30, 2009 at 4:12 PM, Eric Lease Morgan emor...@nd.edu wrote:

 Is there any way I can make my Perl wrapper for yaz-marcdump, below, more
 efficient?


Dump as you go rather than read up the whole thing into memory.  Actually,
why do you need perl at all?  This is just a regular yaz-marcdump call.
Pipe it into your indexer.

Otherwise, skip the subroutine and just do:


 open (C, $y $file |) or die Can't open converter: $!\n;
 print, while (C);
 close C;


Wouldn't your code have been getting only the last line $r returned?
Perhaps you intended to append to $r each pass through the loop.

--Joe


Re: [CODE4LIB] release management

2008-10-29 Thread Joe Atzberger
I see your SVN and raise you one git.

http://git.or.cz/

Phil is right though, articulate version control is the only technical way
to keep diverse coders working on the same project.  Git takes a distributed
approach and changes certain philosophical underpinnings of how to manage
source.  You may have seen my LibLime coworker Galen present on git at the
last code4lib con.  You can catch the video for that here:

http://video.google.com/videosearch?q=code4lib+2008so=1sitesearch=#q=code4lib%202008%20Galenemb=0so=1

Personally, I haven't found any reason to go back to SVN.

--Joe Atzberger

On Wed, Oct 29, 2008 at 10:49 AM, Phil Cryer [EMAIL PROTECTED] wrote:

 On Wed, 2008-10-29 at 10:30 -0400, Jonathan Rochkind wrote:
  Can anyone reccommend any good sources on how to do 'release management'
  in a small distributed open source project. Or in a small in-house not
  open source project, for that matter. The key thing is not something
  assuming you're in a giant company with a QA team, but instead a small
  project with a a few (to dozens) of developers, no dedicated QA team,
 etc.
 
  Anyone have any good books to reccommend on this?

 I would recommend you start using subversion, if you don't want to/can't
 setup your own server, there are places online you can use it for free:

 http://code.google.com/hosting/
 http://www.assembla.com/
 http://unfuddle.com/

 A slight learning curve, but necessary if you want to collaborate.

 P

 
  Jonathan
 
 --
 Phil Cryer | Open Source Dev Lead | web www.mobot.org | skype phil.cryer



Re: [CODE4LIB] Code4Lib Logo

2008-09-29 Thread Joe Atzberger
On Tue, Sep 23, 2008 at 10:01 AM, Nicolas Morin
[EMAIL PROTECTED]wrote:

 On Tue, Sep 23, 2008 at 3:56 PM, wally grotophorst [EMAIL PROTECTED]
 wrote:

  I'll risk ostracism and admit that I think this concern with a logo is a
  little too corporate for my sensibilities.

 But then that'd be part of the guidelines given to the designer: the logo
 shouldn't look too corporate if it's to represent what the code4lib
 community is about...
 Nicolas


Actually, his beef appears to be with the group's concern itself, regardless
of any logo produced.  Is that a correct interpretation, Wally?

It would be a logical entailment that if the group can't consider producing
a logo, it either goes on without one or maybe lucks into having one (or
several, perhaps of varying quality) with some unstable *de
facto*consensus.  To me, the results of this approach tend to look
amateurish
(including my own).

I think code4lib should have a quality logo, and therefore should have an
open and deterministic process for producing and selecting one.  This fairly
rudimentary level of organization really has nothing to do with
corporateness.  My family picks the photo they want to sent out with the
Christmas cards, but that doesn't make us a corporation.

If there is a persuasive case to be made *against* pursuing a logo for the
group, please consider now the time to make it...

--joe atzberger


Re: [CODE4LIB] Query: Standalone - log file code - for tracking CDRom Usage

2008-07-22 Thread Joe Atzberger
Notably, I think this is what Sony's hackware/spyware did some years ago (or
attempted to), amongst others, so it should be possible.

I think you probably want to pull data from the System event log though, at
which point you can defer to whatever log-consolidation or event scanner
that your systems people prefer for your platform, or perhaps you want an
intervening piece of software like CDROM Watchdog.  Not sure what else
there is beyond that.

--Joe

On Tue, Jul 22, 2008 at 9:07 AM, Svarckopf, Jennifer [EMAIL PROTECTED]
wrote:

 Here at Justice we have a number of standalone computers with CDRoms
 (I'm totally new to my job) and the new Collection Development Librarian
 would like to find out how much the CDRoms are used.  I've found a few
 references in the late nineties to a log file that can track which CD
 Roms have been used and for how long.  Does anyone have something like
 this they can share?  Any other ideas?  Thanks so much.

 Cheers,
 Jennifer
 Jennifer Svarckopf
 613-957-4592
 [EMAIL PROTECTED]



Re: [CODE4LIB] BarCampOhio and LibraryCampOhio, August 11, 2008

2008-07-17 Thread Joe Atzberger
Sounds good!

On Thu, Jul 17, 2008 at 3:26 PM, Peter Murray [EMAIL PROTECTED] wrote:

 All of the details, include stuff not covered below, are on the event
 homepage.


Did I miss the URL, or are you holding out on us?   : )

--joe


Re: [CODE4LIB] Free covers from Google

2008-03-15 Thread Joe Atzberger
Impressive!  As luck would have it, I'm working on the question of book
images in Koha this week...
--joe atzberger

On Sat, Mar 15, 2008 at 3:14 AM, Godmar Back [EMAIL PROTECTED] wrote:

 Hi Tim,

 I think this proposal suffers from the same shortcoming as
 LibraryThing's widgets, which is that only one per page is allowed. Aj
 better way may be to use spans and classes and keep the JavaScript in
 a library.
 I've attached the resulting HTML below; see http://libx.org/gbs/ for a
 demo.

  - Godmar

 --- index.html:
 !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
 html
 head
 script src=http://libx.org/gbs/gbsclasses.js;
 type=text/javascript /script
 titleSimple Demo for Google Book Classes/title
 /head

 body
  span title=ISBN:0743226720 class=gbs-thumbnail/span
  span title=ISBN:0061234001 class=gbs-thumbnail/span
  span title=ISBN:1931798230 class=gbs-thumbnail/span

  span title=ISBN:0596000278 class=gbs-thumbnail/span
  span title=0439554934  class=gbs-thumbnail/span
  span title=OCLC:60348769   class=gbs-thumbnail/span
  span title=LCCN:2004022563 class=gbs-thumbnail/span
 /body
 /html

 On Sat, Mar 15, 2008 at 2:04 AM, Tim Spalding [EMAIL PROTECTED]
 wrote:
  (Apologies for cross-posting)
 
   I just posted a simple way to get free book covers into your OPAC. It
   uses the new Google Book Search API.
 
 
 http://www.librarything.com/thingology/2008/03/free-covers-for-your-library-from.php
 
   I think Google has as much cover coverage as anyone. The API is free.
   Most libraries pay. I'm thinking this is a big deal?
 
   We'll probably fancy it up a bit as an add-on to our LibraryThing for
   Libraries service, but the core idea can be implemented by anyone.
 
   I look forward to refinements.
 
   Tim
 
   --
   Check out my library at http://www.librarything.com/profile/timspalding
 



Re: [CODE4LIB] perl6

2008-01-22 Thread Joe Atzberger
We get an honest-to-god switch statement, finally.  And better regexp
optimization.  I too plan to convert relatively slowly, starting with
5.10for now and re-reading perldelta.

--joe

On Jan 21, 2008 8:00 AM, Eric Lease Morgan [EMAIL PROTECTED] wrote:

 Just what will I be able to do better
 with this (completely) new version?



Re: [CODE4LIB] [Web4lib] Library Staff Scheduler

2007-09-05 Thread Joe Atzberger
You might consider RSS syndication as a third possible means of publishing a
schedule, or rather, as an alternative to directly dumping HTML.  Clearly it
would take more work than just generating a printout, but the
interoperability is sweet.
--Joe

On 9/5/07, Helen Chu [EMAIL PROTECTED] wrote:

 By publish I would like to be able to do one or more of the following:

 1. publish a public version to a public web site so we can see who (which
 person with which specific skills) will staff the desk

 2. print out on paper

 The shift swapping would be great so that our schedule coordinator doesn't
 have to spend his/her time juggling students' midterms schedules. We've got
 more complex work for our staff!

 Helen

 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Sharon Foster
 Sent: Wednesday, September 05, 2007 2:18 PM
 To: CODE4LIB@listserv.nd.edu
 Subject: Re: [CODE4LIB] [Web4lib] Library Staff Scheduler

 By publish, do you mean print to hard-copy, or something more?

 A swap board is an excellent feature. It beats leaving a note on the staff
 bulletin board: Can anyone swap with me for the week of November 7th?


 On 9/5/07, Helen Chu [EMAIL PROTECTED] wrote:
  Hi All,
 
  Been looking for a staff scheduling program too. I need additional
 functionality:
 
  - students should be able to trade shifts with each other
  - we can easily publish the schedule of who's working
 
  Anyone had any success with this?
 
  BTW, Deb, do you know Barron Koralesky? Good friend of mine.
 
  Thanks,
  Helen
 
  
  Helen Chu
  Director, Library Information Technology California Polytechnic
  University San Luis Obispo, CA 93401 [EMAIL PROTECTED]
 
 
  -Original Message-
  From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf
  Of Deb Bergeron
  Sent: Wednesday, September 05, 2007 9:19 AM
  To: CODE4LIB@listserv.nd.edu
  Subject: Re: [CODE4LIB] [Web4lib] Library Staff Scheduler
 
  Sharon,
 
  Thank you.  While our application may be different, I am interested in
 what you develop.  Are looking at starting development immediately?
 
  Please let me know your progress or if I can help in any way.
 
  Deb
 
  Sharon Foster wrote:
   Gotcha! My library is in a consortium as well, and there is a
   courier service, although since we are such a small state, it is
   actually a state-wide service, not just for our consortium.
  
   My initial reaction is that the application I have in mind *could*
   be used to set up a courier schedule, but instead of one desk and
   several people staffing it over the course of a day, you have one
   person moving to different desks (libraries) over the course of a
   day. I think that's a different enough pattern, along with the is
   it on time? requirement, to warrant its own application.
  
   The question I was asking was directed to public and academic
   library systems with more than one location or branch. Do you ever
   move people around among the branches? If so, then I want the
   scheduler to incorporate that.
  
  
   On 9/5/07, Deb Bergeron [EMAIL PROTECTED] wrote:
  
Sharon,
  
I think I  need to clarify.  We are an academic consortium of 14
   completely different libraries who share a common ILS, consequently
   we have no 'branches;' each library is independent.  Some of the
   libraries have their own branches or locations, however, and could
   use your scheduler application in their own library. So your
   question about staff being assigned to another branch does not
   apply in our case.  What does apply is knowing the library hours and
 academic calendar.
  
Our  office manages the ILS and all of its components. One of
   those components is the courier.  The courier picks up and delivers
   items to all of the consortial libraries as well as our state-wide
 ILL system (MINITEX).
   The courier schedule changes throughout the year and sometimes daily
 (i.e.
   storm, accident, traffic, etc.).  It would be great to have an
   online application indicating:
  
  
   Courier's schedule
   Is he on time?
   Issues
   If a library requests an additional pick-up Our goal is 24 hour
   turn-around and often-times it's less than that.
  
For both applications, it would be fabulous to have an online tool
   that provides all the information I've described.
  
I hope this clarifies the lay of our land for you.
  
  
Thanks,
  
Deb
  
  
  
Sharon Foster wrote:
Indeed! I hadn't even thought of multiple libraries in a system,
   since I haven't yet worked in a system with branch libraries.
  
   Is it ever the case that staff may be temporarily assigned to
   another branch, not their home branch?
  
   Are couriers thought of as assigned to a particular library, or are
   they part of the larger system?
  
   Thanks for your input!
  
   On 9/5/07, Deb Bergeron [EMAIL PROTECTED] wrote:
  
  
Sharon,
  
Kudos to you for taking this on!
 

Re: [CODE4LIB] executing a cgi script in the middle of a url

2007-08-01 Thread Joe Atzberger
Note also that, unless something has changed in more recent releases from
MS, if you attempt to use IIS instead of Apache, path_info() in Perl's CGI
won't work.

My (undirected) approach eventually led me to use mod_rewrite and regular
apache AliasMatch and ScriptAliasMatch commands.   Example:
___
RewriteEngine on
RewriteRule /barcodes/([inosx]?[0-9]+)\.js  /cgi-bin/barcode.pl?$1
[E=BARCODE:$1]
RewriteRule /names/([A-z]+)\.js
/cgi-bin/name.pl?$1[E=BARCODE:$1]
   AliasMatch ^.*/images/(.*)/var/apache/htdocs/my_app_1/images/$1
   AliasMatch ^.*/css/(.*)   /var/apache/htdocs/my_app_1/css/$1
ScriptAliasMatch ^.*/cgi-bin/(.*)   /var/apache/htdocs/my_app_1/cgi-bin/$1
___


The bracketed parts at the back end just set the environmental variable
BARCODE, strictly optional.

--joe


Re: [CODE4LIB] Citation parsing?

2007-07-20 Thread Joe Atzberger

On 7/20/07, Eric Hellman [EMAIL PROTECTED] wrote:


Have people been able to do a decent job of identifying parts of
speech in natural language?



I think trying to import broad NLP findings into our narrower problem of
citation parsing is not likely to be fruitful but on the other hand
stealing their tools seems perfectly reasonable, and this group seems to be
familiar with several.

About 8 years ago, I made use of a parser-genator called ANTLR (ANother Tool
For Language Recognition) that takes an EBNF grammar spec and builds a
parser.  Since then developers have improved the tool with some new versions
and even a GUI development environment.  The languages recognized in
practice all seem to be well-defined programming languages, but if you
wanted to roll your own (new) parser for citations, ANTLR might help.

I think ANTLR satisfies Eric's first two crtiteria for flexibility and ease
of extension and might be used to satisfy the third (broad contextual
info).  It now includes a kind of ability to back itself out of rule descent
and try other alternatives in the tree if the static gramar fails.  The
license is BSD.  Notably, it supports unicode and the new version does NOT
require a pre-specified number of look-ahead tokens. And the userbase is
fairly broad for such a specialized tool.

This might be considered an incongruous solution inasmuch as you are asking
for parser characteristics and I am recommending a parser generator that
*could* produce the kind of parser you want.  But I think that is
appropriate for the task described.

--joe


Re: [CODE4LIB] catholic portal

2007-06-25 Thread Joe Atzberger

You had me with the compelling illustration.  :)

I haven't implemented every piece in the puzzle, but it seems like a viable
setup.

On 6/25/07, Eric Lease Morgan [EMAIL PROTECTED] wrote:



Below is some text I wrote outlining the technical infrastructure for
at thing we colloquially call the catholic
portal (www.catholicresearch.net). Does the infrastructure make
sense to y'all? If it doesn't make sense to you, then it won't make
sense to non-technoweenies.

Catholic Research Resources Initiative and its technical infrastructure

This text outlines the proposed technical infrastructure for the
Catholic Research Resources Initiative (CRRI).

The infrastructure begins with two assumptions. First, from the
user's point of view, the system provides a searchable/browsable
interface to sets of EAD (Encoded Archival Document) files. Second,
the system makes every effort to provide this interface through well-
established Web-based protocols thus making the underlying components
more modular.

Figure 1 illustrates the proposal. Starting on the far left are sets
of EAD files. These files will be created remotely at partner
institutions and sent to a central location. Once received metadata
will be extracted and stored in a relational database along with the
entire EAD files. This metadata, in combination with a simple faceted
classification system, will provide a way to maintain and logically
organize the CRRI content. We propose to use MySQL as the relational
database and a set of object-oriented Perl modules called MyLibrary
to facilitate input/output against the database. [1, 2]

To facilitate search, a report will be written against the database
and given to an indexing program. The indexer/search engine is
expected to support fielded, free-text, and full-text searching, as
well as relevancy ranking. More importantly, the search engine is
expected to be accessible through a Web Services-based protocol
called SRU (Search-Retrieve via URL). [3] This will enable other
information services to search the CRRI without using the CRRI
website. Examples of other information services include metasearch
interfaces now common in libraries. The use of SRU will also enable
the CRRI to exchange its underlying indexing program without changing
the user interface. We plan to use either Zebra, Kinosearch, or
Lucene as our indexing program. [4, 5, 6]

To facilitate browse the increasingly popular faceted navigation
technique will be employed. Using the metadata contained in the EAD
files, very broad facets will be created. Examples include
subjects, formats, people, institutions, themes, and maybe dates.
Each facet will have associated with it sets of terms such as
African Americans, letters, Dorothy Day, Seton Hall University, or
Catholic Social Action. Through a second set of reports, these facet/
term combinations will be displayed in a user's browser, and by
selecting them relevant content will be returned.

To broaden access to the CRRI's content, a third set of reports will
be written against the database to enable OAI-PMH (Open Archives
Initiative - Protocol for Metadata Harvesting). [7] These reports
will result in the creation of sets of XML files saved to the
computer's file system. An OAI data repository application will
provide access to the files and enable OAI service providers to
read the metadata and use it in other applications. We plan to use
XMLFile for the data repository. [8] An example of a service provider
is OAIster. [9]


Links

1. MySQL - http://mysql.com
2. MyLibrary - http://dewey.library.nd.edu/mylibrary
3. SRU - http://loc.gov/standards/sru
4. Zebra - http://indexdata.dk/zebra
5. Kinosearch - http://rectangular.com/kinosearch
6. Lucene - http://lucene.apache.org
7. OAI-PMH - http://openarchives.org
8. XMLFile - http://www.dlib.vt.edu/projects/OAI/software/xmlfile
9. OAIster - http://oaister.org

--
Eric Lease Morgan
University Libraries of Notre Dame



Re: [CODE4LIB] marc2oai

2007-05-29 Thread Joe Atzberger

Well, that's an impressive teaser, anyway, Andrew  Looking forward to
your release!

On 5/29/07, Andrew Nagy [EMAIL PROTECTED] wrote:


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Eric Lease Morgan
 Sent: Tuesday, May 29, 2007 1:53 PM
 To: CODE4LIB@listserv.nd.edu
 Subject: [CODE4LIB] marc2oai

 Does anybody here know of a MARC2OAI program?


Eric, I have a small script that does this, it is fairly quite
simple.  Probably about 100 lines of code or so.

I have a nightly cron script that gets any new/modified marc records from
the past 24 hours out of the catalog and then runs marc2xml on the dump
file.  Then I have a small script that breaks up the large marcxml files
into individual xml files and imports them into SOLR!  I then can use an XSL
stylesheet such as the LOC's marc2oai to produce an OAI document or the
marc2rdf, etc on the full marcxml files (since solr doesn't have the
original record).  I have yet to incorporate my OAI server code into this,
but since it is already written, it would be a fairly easy merge.

This is all built into my NextGen OPAC that I am working on and hope to
open-source sometime this summer.  So sorry, im not allowed to hand out the
code just yet :(

Thanks
Andrew