Re: [CODE4LIB] Conference all-timers?

2013-02-15 Thread Andrew Nagy
Around where I was sitting - there was myself, Dan Chudnov and Karen Coombs.

On Fri, Feb 15, 2013 at 9:53 AM, Michael J. Giarlo wrote:


 Every year when hands shoot up in response to the question of how many of
 you have attended all code4lib conferences?, I neglect to note who's
 raising those hands.

 Who are my fellow all-timers?


Re: [CODE4LIB] 2013 Code4lib Conference Registration (Change of time)

2012-11-28 Thread Andrew Nagy
Will there be reserved registration slots for speakers, or do they need to
be on ready to register 2 minutes before noon-eastern like a Bruce
Springstein concert?

-- Forwarded message --
From: Francis Kayiwa
Date: Tue, Nov 27, 2012 at 1:16 PM
Subject: [CODE4LIB] 2013 Code4lib Conference Registration (Change of time)

Looks like quite a few of you missed the change of Registration date. If
you have registered today you did so on the Test Server and will need
to register next week.

Registration was moved to December 4th at noon Eastern Standard Time.

Documentation is the castor oil of programming.  Managers know it must
be good because the programmers hate it so much.

[CODE4LIB] New Newcomer Dinner option

2012-02-03 Thread Andrew Nagy
Hi All - I just added another restaurant option to the newcomer dinner list
as the options are starting to look quite full.  I've listed Momiji - a new
japanese restaurant that I have been wanting to try and a very short cab
ride from the hotel.  If anyone signs up, I'll make a reservation.


Re: [CODE4LIB] Voting is open for code4lib 2012 presentations.

2011-11-22 Thread Andrew Nagy
My votes are not showing after returning to the voting page.  I thought I
remembered being able to modify my votes from previous years.  I went
through the first 30 or so, and wanted to come back to it to go through
more, but my votes are not persisting.  Is this a bug, a change, or a
failure in my memory?


On Tue, Nov 22, 2011 at 2:14 PM, Michael J. Giarlo wrote:


 On Tue, Nov 22, 2011 at 14:08, Michael B. Klein wrote:
  Hmm. 404'ing for me now.
  On Nov 22, 2011, at 4:22 AM, Ross Singer wrote:
  Ok, the results screen should no longer be throwing an error.
  Vote early, vote often,
  On Tue, Nov 22, 2011 at 6:57 AM, Ross Singer
  Mark, I'm only getting that for the results page.  Are you getting it
  somewhere else?
  I'll fix the results page as soon as I can.
  On Monday, November 21, 2011, Mark Diggory
  The ever popular...Internal Server Error
  On Mon, Nov 21, 2011 at 7:34 PM, Anjanette Young
  Voting for code4lib 2012 talks are now open.
  Voting will close at 5pm (PST) on December 9, 2011.
  Presentation criteria to keep in mind
 - Usefulness
 - Newness
 - Geekiness
 - Diversity of topics -- You will need your
  code4lib.orglogin in order to vote. If you do not have one you can
  one at
  Presentation proposal descriptions can be found on the wiki
  Thank you to Ross Singer for keying in all 72 proposals!
  You received this message because you are subscribed to the Google
  code4libcon group.
  To post to this group, send email to
  To unsubscribe from this group, send email to
  For more options, visit this group at
  [image: @mire Inc.]
  *Mark Diggory*
  *2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010*
  *Esperantolaan 4, Heverlee 3001, Belgium*

Re: [CODE4LIB] Code4lib 2012 Seattle. Call for presentation proposals

2011-10-07 Thread Andrew Nagy
I'd like to hear more about the DPLA project - I hope we get a proposal
about that this year!  I'll post it to the wiki page.


On Wed, Oct 5, 2011 at 6:17 PM, Anjanette Young youn...@u.washington.eduwrote:

 Code4lib 2012 call for proposals.

 We are now accepting proposals for Code4lib 2012.

 Code4lib 2012 is a loosely-structured conference for library technologists
 to commune, gather/create/share ideas and software, be inspired, and forge
 collaborations.  The conference will be held Monday February 6th
 (Preconference Day) - Thursday February 9th, 2012 in Seattle, WA. More
 information can be found at

 Prepared Talks

 Head over to the call for proposals page at and submit your
 for a prepared talk for this year's conference!  Proposals should be no
 longer than 500 words, and preferably many less.

 Prepared talks are 20 minutes (including setup and questions), and focus on
 one or more of the following areas:
  * tools (some cool new software, software library or integration platform)
  * specs (how to get the most out of some protocols, or proposals for new
  * challenges (one or more big problems we should collectively address)

 The community will vote on proposals using the criteria of:
  * usefulness
  * newness
  * geekiness
  * diversity of topics
  * awesomeness

 Proposals can be submitted through Sunday, November 19th, 5pm (PST). Voting
 will commence soon thereafter and be open through Friday, December 9th.
 Successful candidates will be notified by December 12th. The submitter (and
 if necessary a second presenter) will be guaranteed an opportunity to
 register for the conference through December 23st.

 Proposals for preconferences are also open until November 19th, 5pm (PST).

 We cannot accept every prepared talk proposal, but multiple lightning talk
 and breakout sessions will provide everyone who wishes to present with an
 opportunity to do so.

 Anjanette Young | Systems Librarian
 University of Washington Libraries
 Box 352900 | Seattle, WA 98195
 Phone: 206.616.2867

Re: [CODE4LIB] Code4Lib Community google custom search

2011-10-06 Thread Andrew Nagy
Nice job Jonathan - my first test search seemed to bring back rather
relevant materials with the first coming from the journal:

Very cool and very useful


On Thu, Oct 6, 2011 at 9:35 PM, Jonathan Rochkind wrote:

 So I was in #code4lib, and skome asked about ideas for library hours. And I
 recalled that there have been at least two articles in the C4L Journal on
 this topic, so suggested them.

 Then I realized that there's enough body of work in the Journal to be worth
 searching there whenever you have an ideas for dealing with X question.
 You might not find anything, but I think there's enough chance you will,
 illustrated by that encounter with skome.

 Then I realized it's not just the journal -- what about a Google Custom
 Search that searches over the Journal, the Code4Lib wiki, the Code4Lib
 website, and perhaps most interestinly -- all the sites listed in Planet

 Then I made it happen. Cause it seemed interesting and I'm a perfectionist,
 I even set things up so a cronjob automatically syncs the list of sites in
 the Planet with the Google custom search every night.

 The Planet stuff ends up potentially being a lot of noise -- I tried to
 custom 'boost' stuff from the Journal, but I'm not sure it worked. But I did
 configure things with facet-like limits including a just the planet limit,
 if you do want that. But even though it's sometimes a lot of noise, it's
 also potentially the most interesting/useful part of the search, otherwise
 it'd pretty much just be a Journal search, but now it includes a bunch of
 people's blogs, as well as other sites deemed of interest to Code4Lib
 community (including a couple other open source library tech journals) --
 without any extra curatorial work, just using the list already compiled for
 the Planet.

 I'm curious what people think of it. Try some searches for library tech
 questions or information and see how good your results are. If people find
 this useful, I'll try to include it on the main webpage in
 some prominent place, spruce up the look and feel etc. (Or try to draft
 someone else to do that, I think my time to work on this might be _just_
 about up after staying until 9.30 hacking on this cause it seemed cool).**custom_search/search_form.html

Re: [CODE4LIB] 2012 preconference proposals wanted!

2011-09-26 Thread Andrew Nagy
Is anyone leading this session or is a free for all?  Code4lib site is down
- so I can't see whats on the wiki.

We use Git very heavily with the engineering of Serials Solutions' Summon
and we'd be happy to have an engineer do a session on some of the ways we
use it on a fairly large project/codebase if the group is interested.


On Fri, Sep 23, 2011 at 12:17 PM, Rob Casson wrote:


 looking forward to it

 On Fri, Sep 23, 2011 at 11:46 AM, Cary Gordon
  Afternoon is great. I am willing to help present.
  I am not excited about doing a git /subversion comparison, and would
  rather see the time filled with git specific info. There is certainly
  enough of it to keep us busy.
  I am not a raconteur, but a couple years ago, when the Drupal
  migration from CVS was in its nascent stage, I was walking Dries
  Buytaert back to his hotel... on Rue Git in Paris. He asked if I
  though that was portentous. I said it was bzr.
  On Fri, Sep 23, 2011 at 7:47 AM, Ian Walls wrote:
  Cool, I'll add this to the wiki, then.
  Anyone prefer morning v. afternoon?  Afternoon is currently empty, so I
  figure it'd make sense to default there for now.  Unless folks want to
  about Git for the whole day
  Giving the session a cute name... git lends itself well to such.  I'm
  no way wedded to the name; I may have had too much/little caffeine this
  On Fri, Sep 23, 2011 at 10:38 AM, Kevin S. Clarke
  On Fri, Sep 23, 2011 at 10:02 AM, Ian Walls wrote:
   If we still need someone to take the lead on this, I would
  I don't believe anyone else has volunteered to lead so if you want to
  do it, run with it!
  I'd be glad to do a quick bit on how easy it is to use gitolite for
  private git repositories, if there is time for it (with all the other
  good git topics that have been suggested).
  Ian Walls
  Lead Development Specialist
  ByWater Solutions
  Phone # (888) 900-8944
  Twitter: @sekjal
  Cary Gordon
  The Cherry Hill Company

Re: [CODE4LIB] Code4Lib 2012 Seattle Update.

2011-06-10 Thread Andrew Nagy
Hi Anj - I just wanted to let you know that Serials Solutions is working out
a plan to better support the conference.  We'd possibly like to sponsor an
evening event, we will have more information for you later in the summer.


On Tue, Jun 7, 2011 at 1:14 PM, Anjanette Young youn...@u.washington.eduwrote:

 Code4Lib Seattle 2012 update.  Thanks to Elizabeth Duell of Orbis Cascade
 Alliance and Cary Gordon of, we finally have a venue with
 adequate (hopefully) bandwidth and wireless access points, a reasonable
  beverage minimum, and chairs!  The Renaissance Hotel (515 Madison St.,
 Seattle, WA 98104) is located in the chilly heart of downtown Seattle,
 close to the University district, but even closer to the restaurants, bars,
 breweries and distilleries in the Belltown, Downtown, Pioneer Square, and
 Capitol Hill neighborhoods.

 We could use lots of help, please consider volunteering for a committee:

 Anjanette Young | Systems Librarian
 University of Washington Libraries
 Box 352900 | Seattle, WA 98195
 Phone: 206.616.2867

Re: [CODE4LIB] Adding VIAF links to Wikipedia

2011-05-26 Thread Andrew Nagy
Ralph - this sounds like a very valuable process.  I would imagine it could
solve the problem illustrated here:

What would be the best path forward?  Im not active in the wikipedia
community - but I understand that their is a community of editors.  Perhaps
lobbying them for support while clearly identifying the value for community
of scholarship would allow this to happen?

Does anyone have experience with the editorial group or policy group in the
wikipedia community?


On Thu, May 26, 2011 at 2:01 PM, Ralph LeVan wrote:

 OCLC Research would desperately love to add VIAF links to Wikipedia
 articles, but it seems to be very difficult.  The OpenLibrary folks tried
 do it a while back and ended up getting their plans severely curtailed.
 discussion at Wikipedia is captured here:

 Probably for very good reasons, this seems to be a very political process.
  That means we need to have pretty good support both within and outside
 the Wikipedia community to do this.

 Starting with the friendliest community I can think of, is there such
 support?  Should we move forward on creating a ViafBot to stick VIAF links
 into Wikipedia?



Re: [CODE4LIB] dealing with Summon

2011-03-01 Thread Andrew Nagy
Hi Godmar - to help answer some of your questions about the fields - I can
help address those directly.  Though it would be interesting to hear
experiences from others who are working from APIs to search systems such as
Summon or others.

In regards to the publication date - the Summon API has the raw date
(which comes directly from the content provider), but we also provide a
field with a microformat containing the parsed and cleaned date that Summon
has generated.  We advise for you to use our parsed and cleaned date rather
than the raw date.  The URL and URI fields are similar, the URL is the link
that we have generated - the URI is what is provided by the content
provider.  In your case, you appear to be referring to OPAC records, so the
URI is the ToC that came from the 856$u field in your MARC records.  The URL
is a link to the record in the OPAC.

If you need more assistance around the fields that are available via Summon,
I'd be happy to take this conversation off-list.

I think an interesting conversation for the Code4Lib community would be
around a standardized approach for an API that meets both the needs of the
library developer and the product vendor.  I recall a brief chat I had with
Annette about this same topic at a NISO conference in Boston a while back.
For example, we have SRU/W, but that does not provide support for all of the
features that a search engine would need (ie. facets, spelling corrections,
recommendations, etc.).  Maybe a new standard is needed - or maybe extending
an existing one would solve this need?  I'm all ears if you have any ideas.


On Tue, Mar 1, 2011 at 2:14 PM, Godmar Back wrote:

 Hi -

 this is a comment/question about a particular discovery system
 (Summon), but perhaps of more general interest. It's not intended as
 flamebait or criticism of the vendor or people associated with it.

 When integrating Summon into LibX (which works quite nicely btw,
 gratuitous screenshot is attached to this email) I found myself amazed
 by the multitude of possible fields and combinations returned in the
 resulting records. For instance, some records contains fields 'url'
 (lower case), and/or 'URL' (upper case), and/or 'URI' (upper case).
 Which one to display, and how?  For instance, some records contain an
 OPAC URL in the 'url' field, and a ToC link in the URI field. Why?

 Similarly, the date associated with a record can come in a variety of
 formats. Some are single-field (20080901), some are abbreviated
 (200811), some are separated into year, month, date, etc.  Some
 records have a mixture of those.

 My question is how do other adopters of Summon, or of emerging
 discovery systems that provide direct access to their records in
 general, deal with the roughness of the records being returned?  Are
 there best practices in how to extract information from them, and in
 how to prioritize relevant and weed out irrelevant or redundant

  - Godmar

Re: [CODE4LIB] Ride sharing IND - Bloomington - IND

2010-12-17 Thread Andrew Nagy
To help better track ride share opportunities, I created a page on the
Code4Lib wiki.

This way folks seeking ride share opportunities can sign up for a ride - and
those offering can list their ride.


On Thu, Dec 16, 2010 at 5:52 PM, Cary Gordon wrote:

 I will be renting a car and driving to Bloomington on Sunday, the 6th
 at about 630 PM (assuming on-time arrival at 6ish) and returning on
 the 10th in time to make my 7 PM flight.

 I can take one or two people with a reasonable amount of luggage each
 way, and no, they don't have to be the same people.

 Let me know if you are interested.



 Cary Gordon
 The Cherry Hill Company

Re: [CODE4LIB] algorithm for Summon's Recommender

2010-05-06 Thread Andrew Nagy
Hi Ya'aqov - I'm about to board a plane so I don't have much time for
a well formed response.  We do not have anything published about
Summon's relevancy algorithms nor the recommendation engine.  I'd be
happy to answer any specific questions offline as I don't feel it
appropriate to get into details about a commericial product in this


On 5/6/10, Ziso, Ya'aqov wrote:
 hi Andrew,

 bX derives from research done at Los  Alamos National Laboratory by Johan
 Bollen and Herbert Van de Sompel. Its ranking and algorithm can be analyzed
 in the published article
 Can SerialsSolutions point us to something explaining Summon’s Recommender?

 •  If you're not part of the problem, you're not part of the solution •

Sent from my mobile device

Re: [CODE4LIB] Q: what is the best open source native XML database

2010-01-17 Thread Andrew Nagy
I've had the best luck with eXist and BerkeleyDB XML.

Both support XQuery and have indexing features based on any XML structure.


On 1/16/10, Godmar Back wrote:

 we're currently looking for an XML database to store a variety of
 small-to-medium sized XML documents. The XML documents are
 unstructured in the sense that they do not follow a schema or DTD, and
 that their structure will be changing over time. We'll need to do
 efficient searching based on elements, attributes, and full text
 within text content. More importantly, the documents are mutable.
 We'll like to bring documents or fragments into memory in a DOM
 representation, manipulate them, then put them back into the database.
 Ideally, this should be done in a transaction-like manner. We need to
 efficiently serve document fragments over HTTP, ideally in a manner
 that allows for scaling through replication. We would prefer strong
 support for Java integration, but it's not a must.

 Have other encountered similar problems, and what have you been using?

 So far, we're researching: eXist-DB ( ),
 Base-X ( ), MonetDB/XQuery
 ( ), Sedna
 ( ). Wikipedia lists a few
 others here:
 I'm wondering to what extent systems such as Lucene, or even digital
 object repositories such as Fedora could be coaxed into this usage

 Thanks for any insight you have or experience you can share.

  - Godmar

Sent from my mobile device

Re: [CODE4LIB] David Walker Wins Third OCLC Research Software Contest

2009-07-22 Thread Andrew Nagy

Just watched the video - great job David!

On Wed, Jul 22, 2009 at 9:01 PM, Roy Tennant wrote:

 DUBLIN, Ohio, USA, 22 July 2009

 David Walker Wins Third OCLC Research Software Contest

 David Walker has won the Third OCLC Research Software Contest with Bridge,
 set of services to provide a configurable and customizable full record
 display made up of WorldCat services.  These services provide the ability
 for an individual library to customize the full record display of WorldCat
 records to their particular situation.

 The contest judges were impressed with how Mr. Walker was able to provide a
 set of very useful methods to enhance WorldCat services from the
 of individual libraries. The software architecture, code, and documentation
 also were impressive. As the contest winner, Mr. Walker will receive a
 for $2,500 and a visit with OCLC researchers and others in Dublin, Ohio

 David Walker is Library Web Services Manager at California State
 More information about Bridge is linked below.

 The Third OCLC Research Software Contest ran from mid-April through the end
 of June.  Its goal was to encourage innovation in the use of OCLC web-based
 services for libraries.

 Entries were judged by a panel of expert practitioners and academicians
 OCLC and the library/information community:

 Kevin Clarke
 Coordinator of Web Services
 Belk Library and Information Commons
 Appalachian State University

 Thom Hickey
 Chief Scientist

 Tod Matola
 Software Architect

 Ross Singer
 Interoperability and Open Standards Champion
 and winner of the Second OCLC Research Software Contest

 Roy Tennant
 Senior Program Officer
 OCLC Research

 More information:

 David Walker's Bridge

 Contest Overview

 Contest judges


 Roy Tennant
 Senior Program Officer
 OCLC Research

 Robert Bolander
 Senior Communications Officer
 OCLC Research

Re: [CODE4LIB] How to access environment variables in XSL

2009-06-19 Thread Andrew Nagy
If you are using some sort of XSL processor in a programming language (java,
php, ruby) you can assign a variable to the xsl file and use the variable
in the file much like you would in any other scripting environment.

You can also go one step ahead and use XQuery which gives you the ability to
access a FLOWR based enviornment where you can declare variables and
introduce some more advanced logic over XSL.


On Fri, Jun 19, 2009 at 3:44 PM, Doran, Michael D wrote:

 I am working with some XSL pages that serve up HTML on the web.  I'm new to
 XSL.   In my prior web development, I was accustomed to being able to access
 environment variables (and their values, natch) in my CGI scripts and/or via
 Server Side Includes.  Is there an equivalent mechanism for accessing those
 environment variables within an XSL page?

 These are examples of the variables I'm referring to:

 In a Perl CGI script, I would do something like this:
my $server = $ENV{'SERVER_NAME'};

 Or in an SSI, I could do something like this:
!--#echo var=REMOTE_ADDR--

 If it matters, I'm working in: Solaris/Apache/Tomcat

 I've googled this but not found anything useful yet (except for other
 people asking the same question).  Maybe I'm asking the wrong question.  Any
 help would be appreciated.

 -- Michael

 # Michael Doran, Systems Librarian
 # University of Texas at Arlington
 # 817-272-5326 office
 # 817-688-1926 mobile

Re: [CODE4LIB] Serials Solutions Summon

2009-05-04 Thread Andrew Nagy
David - Keep in mind that aggregators are not the original publishers of
content - so even if an aggregator is not yet participating in Summon, the
content in their aggregated databases most often **is** indexed by the
service. To date there are already over 80 individual content providers
participating **in addition to** competing aggregators ProQuest and Gale,
bringing together content from over four thousand publishers.
Regardless of the competitive landscape among aggregators, publishers are
participating in Summon in order to increase discovery of their content.
It's a win-win.


On Tue, Apr 21, 2009 at 11:33 AM, Walker, David dwal...@calstate.eduwrote:

 Even though Summon is marketed as a Serial Solutions system, I tend to
 think of it more as coming from Proquest (the parent company, of course).

 Summon goes a bit beyond what Proquest and CSA have done in the past,
 loading outside publisher data, your local catalog records, and some other
 nice data (no small thing, mind you).  But, like Rob and Mike, I tend to see
 this as an evolutionary step for a database aggregator like Proquest rather
 than a revolutionary one.

 Obviously, database aggregators like Proquest, OCLC, and Ebsco are well
 positioned to do this kind of work.  The problem, though, is that they are
 also competitors.  At some point, if you want to have a truly unified local
 index of _all_ of your database, you're going to have to cross aggregator
 lines.  What happens then?


 David Walker
 Library Web Services Manager
 California State University
 From: Code for Libraries [] On Behalf Of Dr R.
 Sanderson []
 Sent: Tuesday, April 21, 2009 8:14 AM
 Subject: Re: [CODE4LIB] Serials Solutions Summon

 On Tue, 21 Apr 2009, Eric Lease Morgan wrote:
  On Apr 21, 2009, at 10:55 AM, Dr R. Sanderson wrote:
  How is this 'new type' of index any different from an index of OAI-PMH
  harvested material?  Which in turn is no different from any other
  local search, just a different method of ingesting the data?

  This new type of index is not any different in functionality from a
  well-implemented OAI service provider with the exception of the type
  of content it contains.

 Not even the type of content, just the source of the content.
 Eg SS have come to an agreement with the publishers to use their
 content, and they've stuffed it all in one big index with a nice

 NTSH, Move Along...


Re: [CODE4LIB] Serials Solutions Summon

2009-04-22 Thread Andrew Nagy
On Wed, Apr 22, 2009 at 5:08 AM, Laurence Lockton

 Date:Tue, 21 Apr 2009 13:36:30 -0400
 From:Diane I. Hillmann
 Subject: Re: Serials Solutions Summon


 3. Because they also have data on what journals any particular library
 customer has subscribed to, they can customize the product for each
 library, ensuring that the library's users aren't served a bunch of
 results that they ultimately can't access.

 This is one of the great advantages of a local aggregated index, being able
 to flag which documents are actually available to your users, and giving
 them the choice of searching only for these. Lund University's ELIN does
 this and it's really popular. (See a picture

 Is this being offered in Summon and WorldCat Local?

Laurence - Summon does have fulltext access as well as scholary or
peer-reviewed as available facets in Summon to allow users to narrow their
search results by these two facets.  And it is great that you point this out
- this is one of the great benefits of having a single unified index.  You
get to pull all sorts of gems out of the boulders of content.  I am
personnally getting really excited for what our community (code4lib) will be
able to invent on top of services such as Summon.  I think we are going to
be able to find many more gems as well as mashups that allow for some
fanatastic tools.


Re: [CODE4LIB] Serials Solutions Summon

2009-04-18 Thread Andrew Nagy
Yitzchak - I'd be more than happy to answer any questions you have about
Summon.  I will give a brief description to answer your questions - but for
any other questions you might have we can discuss offline as to not spam the
mailing list with lots of propaganda for Summon - thought it is really
awesome and everyone should purchase a subscription :)

Summon is really more than an NGC as we are selling it as a service - a
unified discovery service.  This means that it is a single repository of the
library's content ( subscription content, catalog records, IR data, etc.).
Federated search is not apart of Summon ( thought federated search could be
used along side of Summon), all of your library's content is indexed in a
single repository - no need for broadcast searching.  We have an API for
Summon that allows you to access the service with all of the features that
we offer through the Summon User Interface.  This allows you to plug
Summon searching into an NGC such as VuFind or Blacklight (I've done the
development for Summon integration in VuFind already).  Our company is also
working on the Summon integration for AquaBrowser.

I'd be more than happy to give a demonstration for your institution on
Summon so you can see it in action and get a better understanding.

Please email me directly for any other questions - or if you would like to
schedule a demonstation for your library.


On Fri, Apr 17, 2009 at 12:03 PM, Yitzchak Schaffer yitzc...@touro.eduwrote:

 Hello all:

 I see that there was an Andrew Nagy-led breakout on Summon at the con.
 Summon is a NGC product with the distinction of using a local copy of
 indexes of licensed content (by agreement with Elsevier, JSTOR, et alia) for
 federated search - rather than the traditional Z39.50 or API calls to vendor

 Can anyone offer a brief summary of what was discussed?  I am particularly
 interested in the feasibility of obtaining local indexes for use in an OSS


 Yitzchak Schaffer
 Systems Manager
 Touro College Libraries
 33 West 23rd Street
 New York, NY 10010
 Tel (212) 463-0400 x5230
 Fax (212) 627-3197
 Twitter /torahsyslib

Re: [CODE4LIB] Printed catalogs

2009-03-06 Thread Andrew Nagy
If you do choose to use XSLT, the Library of Congress has a bunch of XSLTs
for MARCXML which will save a tremendous amount of time for you.


On Fri, Mar 6, 2009 at 1:09 PM, Jared Camins wrote:


 I think this sort of question would fall under the purview of this list,
 if there's a better forum for my question, please let me know. I am
 cataloging a special collection in MARC (to take advantage of LC copy
 cataloging, primarily), but at the end of the project I will be producing a
 printed catalog for the owner of the collection. My plan is to use an XSLT
 stylesheet to produce the catalog from MARCXML. I already threw together a
 stylesheet to produce a brief HTML bibliography of the collection, so I am
 confident that this plan would work. We would probably use LaTeX rather
 HTML for output for the final catalog, since that would make the final
 printing easier, not to mention index generation.

 My question is, has anyone done something like this? Any lessons learned
 hard way, stylesheets I could model ours on, or any other advice?

 Thanks in advance for all your help.

 Jared Camins-Esakov

 P.S. I should mention that I am not entirely wed to the idea of using an
 XSLT stylesheet. It seems like the path of least resistance, but if anyone
 could suggest a better tool, I would be very interested to learn about it.
 do have a background in programming, so I would be comfortable using
 C/Perl/whatever, if there were a good reason to do so.

 Jared Camins-Esakov
 Freelance bibliographer and archivist
 (cell) +1 (917) 880-7649

Re: [CODE4LIB] MARC-XML - Qualified Dublin Core XSLT

2009-03-06 Thread Andrew Nagy
Hey David - per my last posting in regards to MARCXML XSLTs - the LOC
maintains a large collection of XSLT for MARCXML that are very thorough


On Fri, Mar 6, 2009 at 3:03 PM, Walker, David wrote:

 Hi All,

 Anyone have an XSLT style sheet to convert from MARC-XML to Qualified
 Dublin Core?

 I'm looking to load these into DSpace, if that makes a difference.  Looks
 like LOC only has MARC-XML to Simple Dublin Core.  This page [1] mentions a
  'MARCXML to Qualified DC styles heets' developed at the University of
 Illinois, but the links are dead.



 David Walker
 Library Web Services Manager
 California State University

Re: [CODE4LIB] release management

2008-11-04 Thread Andrew Nagy
I second the notion for Fogel's book.

From: Code for Libraries [EMAIL PROTECTED] On Behalf Of Randy Metcalfe [EMAIL 
Sent: Wednesday, October 29, 2008 10:42 AM
Subject: Re: [CODE4LIB] release management

2008/10/29 Jonathan Rochkind [EMAIL PROTECTED]:
 Can anyone reccommend any good sources on how to do 'release management' in
 a small distributed open source project. Or in a small in-house not open
 source project, for that matter. The key thing is not something assuming
 you're in a giant company with a QA team, but instead a small project with a
 a few (to dozens) of developers, no dedicated QA team, etc.

 Anyone have any good books to reccommend on this?

Karl Fogel's book Producing Open Source Software is an excellent
choice, though it is not solely focused on release management.



Randy Metcalfe

Re: [CODE4LIB] Open Source Discovery Portal Camp - November 6 - Philadelphia

2008-10-07 Thread Andrew Nagy
I updated the wiki for the conference with a link of nearby hotels that are 
suggested by PALINET.

Here is the link:


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Eric Lease Morgan
 Sent: Tuesday, October 07, 2008 12:34 PM
 Subject: Re: [CODE4LIB] Open Source Discovery Portal Camp - November 6
 - Philadelphia

 It looks as if the University of Pennsylvania is having an event on or
 around the same time as the VUFind event, and that is why things are
 filling/full up. FYI. I believe it is better make reservations sooner
 rather than later.


[CODE4LIB] Open Source Discovery Portal Camp - November 6 - Philadelphia

2008-10-02 Thread Andrew Nagy
Implementing or hacking an Open Source discovery system such as VuFind or 
Interested in learning more about Lucene/Solr applications?

Join the development teams from VuFind and Blacklight at PALINET in 
Philadelphia, November 6, 2008, for day of discussion and sharing. We hope to 
examine difficult issues in developing discovery systems, such as:

* ILS Connectivity
* Authority Control
* Data Importing
* User Interface Issues

Date and time: November 6, 2008, 9:00am to 4:00pm

Registration Fee: $40 for PALINET members and $50 for PALINET non-members.

For more information and how to register, visit our conference wiki:

Re: [CODE4LIB] LOC Authority Data

2008-10-01 Thread Andrew Nagy
If only we knew someone who worked in the LOC that we could tell this 
information to

From: Code for Libraries [EMAIL PROTECTED] On Behalf Of Ed Summers [EMAIL 
Sent: Monday, September 29, 2008 7:02 PM
Subject: Re: [CODE4LIB] LOC Authority Data

On Mon, Sep 29, 2008 at 6:01 PM, Jonathan Rochkind [EMAIL PROTECTED] wrote:
 I thought I remembered something about Casey Bisson doing exactly that with
 a grant/award he received? I forget what happened to it. A snapshot would
 just be a snapshot of course, it wouldn't include records created or
 modified after the snapshot.

That was the bibliographic records which he purchased and donated to
the Internet Archive:

They are also available via a torrent:

It definitely would be nice to do the same thing for the authority
data. It's kind of absurd to me that this data isn't already in the
public domain, since it's uh in the public domain. But what do I know,
I'm not a lawyer.


Re: [CODE4LIB] LOC Authority Data

2008-09-29 Thread Andrew Nagy
I was aware of this data - but I'm really curious if anyone has ever heard of 
or seen a scraping process that is run frequently to get updates.  The data on 
the fred2.0 site is from 2006.  I'd like to try to keep an up to date copy - 
especially since us Americans are entitled to free access to the data.


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Jason Griffey
 Sent: Tuesday, September 23, 2008 5:06 PM
 Subject: Re: [CODE4LIB] LOC Authority Data

 Simon Spero at UNC did a scrape of the entirety of the LoC Authority
 files in Dec of 2006. They are available at Fred 2.0:


 On Tue, Sep 23, 2008 at 4:35 PM, Andrew Nagy
  Hello - I am curious if anyone knows of a way to access the entire
 collection of authority records from the LOC.  It seems that the only
 way to access them know is one record at a time.  Feel free to email me
 off line if you are uncomfortable posting a response to the list.

Re: [CODE4LIB] LOC Authority Data

2008-09-29 Thread Andrew Nagy
 Although note that these are only *subject* authorities.

 Andrew, I think you may also be looking for name authorities (since I
 assume this inquiry came from a suspiciously topically similar thread
 on vufind-tech).

Yes - I would love to be able to obtain all authority files.

 Also, Ed's SKOS data lumps all of the subfields into one string
 literal, so:

Yeah - the marc record has much more data than the rdf file.  I haven't 
explored the indexing process of authority records in detail enough yet to 
determine if this string munging is a problem or not.


Re: [CODE4LIB] Conference: Access 2008 in Hamilton, ON -- October 1-4.

2008-08-27 Thread Andrew Nagy
This may be a bit too specific or complex for 1 day - but I will throw it out 
there and would be more than happy to lead the event.

This is an idea I kind of formalized today:

Develop an authority control mechanism into vufind ( that would 
utilize the library of congress authority control data and automatically 
authorize bibliographic records in vufind.

Step 1:  Download and index LOC authority author records
Step 2:  Update all bib records in a vufind instance with authorized forms and 
alternate forms
Step 3:  Delete unused authority records in authority index
Step 4:  Create a script that processes this on a periodic basis (monthly or 

Voila - free authority control for the library's catalog (assuming they opt to 
use vufind)


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 John Fink
 Sent: Tuesday, August 26, 2008 1:30 PM
 Subject: [CODE4LIB] Conference: Access 2008 in Hamilton, ON -- October

 Also folks, I'm still soliciting Access Hackfest ideas -- let me know
 if you
 have any.


 Registration is now open for Access 2008, Canada's premier library
 technology conference that focuses on issues relating to technology
 planning, development, challenges and solutions.

 *When*: Oct. 1 - 4, 2008

 *Where*: Hamilton, Ontario

 *How:* Visit the conference website to register:

 *What:* Check the conference website for the exciting program! Keynotes
 year will be Karen Schneider and Bob Young!

 This year the conference will be held in Hamilton, Ontario at the
 Hamilton Hotel (conference) and Hamilton Public Library (Hackfest) from
 October 1-4 and is hosted by:

 McMaster University, Hamilton Public Library, Mohawk College  Brock

 **Reserve your room at the Sheraton by Sept. 5th to secure the

 Spots are filling up fast - please register soon!

 *Need conference funding?*

 You may qualify for a grant! There are two grants available, each worth

 ProQuest Student Travel Grant (for students only)

 Equinox-Evergreen First-Timer Grant (for first-time Access attendees

 For more information about these grants and to apply, see the

 -- -- library culture and technology.

Re: [CODE4LIB] III SIP server

2008-06-12 Thread Andrew Nagy
Yes - Please do share!

Here is my vote for an SVN server hosted at


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Walker, David
 Sent: Wednesday, June 11, 2008 6:00 PM
 Subject: Re: [CODE4LIB] III SIP server

 I'd like to see the PHP code, Mark.  Would you mind sending it to me,
 or perhaps posting it somewhere where we all might download it?



 David Walker
 Library Web Services Manager
 California State University


 From: Code for Libraries on behalf of Mark Ellis
 Sent: Wed 6/11/2008 8:42 AM
 Subject: Re: [CODE4LIB] III SIP server


 What are you using for a client?  I have some PHP for getting patron
 information, but there's nothing III specific about it, so I don't know
 if it'd be helpful.  Do you have the 3M SIP SDK?


 Mark Ellis
 Manager, Information Technology
 Richmond Public Library
 Richmond, BC
 (604) 231-6410

 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Schneider, Wayne
 Sent: Tuesday, June 10, 2008 4:29 AM
 Subject: [CODE4LIB] III SIP server

 Has anyone out there attempted to code to III's SIP server?  We're new
 to III, having just merged with another library system that is a III
 customer, and were hoping to be able to use SIP for some basic customer
 account information - nothing too fancy, just basically some of what is
 supported in version 2.00 of the protocol.  Name and address would be
 nice (name we seem to get, but no address), items out, items on hold,
 fines and fees, etc.  Our other ILS, SirsiDynix Horizon, has pretty
 support for SIP 2.00 features, only somewhat idiosyncratic, with a few
 fairly well-documented extensions, and we were hoping to find the same
 level of support in III's server.  Is this an entirely unreasonable

 Wayne Schneider
 ILS System Administrator
 Hennepin County Library

Re: [CODE4LIB] Internet Archive collection codes?

2008-06-04 Thread Andrew Nagy
Excuse me if I am late to the game on this one - but at the Code4Lib conference 
either Brewster Kahle or Aaron Swartz spoke about an API to either the open 
library or the internet archive.  Is this available, or any plans to release 
this?  It seems like you are referring to some sort of API.


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 [Alexis Rossi]
 Sent: Tuesday, June 03, 2008 10:58 PM
 Subject: Re: [CODE4LIB] Internet Archive collection codes?


 You can do a search for mediatype:collection to return results for all
 4200+ collections.

 We have a search interface that will return specific fields for this
 in xml format, if you'd like, but I'll need to give you some
 to access it.  Feel free to send me an email if you'd like to use that


  Does anyone know where to get a list of Internet Archive collection
  codes and their human-displayable display labels?
  For instance:
  americana = American Libraries
  gutenberg = Project Gutenberg
  librivoxaudio = [hell if I know]
  Some of these I can 'scrape' from the quick search box popup on the
  website. But their not all in there. And maybe there's a better place
  get these?
  Anyone know where the right place to ask this of the IA and/or IA
  developer community is?

Re: [CODE4LIB] how to obtain a sampling of ISBNs

2008-04-28 Thread Andrew Nagy
When playing around with OCLC's XISBN service, I plugged in the isbn number for 
one of the gone with the wind books we have at our library - it returned 
something like 150 similar isbn numbers.  You could try doing that for a few 

Just an idea ...


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Godmar Back
 Sent: Monday, April 28, 2008 9:35 AM
 Subject: [CODE4LIB] how to obtain a sampling of ISBNs


 for an investigation/study, I'm looking to obtain a representative
 sample set (say a few hundreds) of ISBNs. For instance, the sample
 could represent LoC's holdings (or some other acceptable/meaningful
 population in the library world).

 Does anybody have any pointers/ideas on how I might go about this?


  - Godmar

Re: [CODE4LIB] place for code examples?

2008-03-31 Thread Andrew Nagy
I think a snippet repository would be a fantastic idea that would fit well 
within the code4lib website.  Dokuwiki would also be a good fit for this and 
would allow people to share the oai harvester in under 50 lines, etc.


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Jonathan Rochkind
 Sent: Monday, March 31, 2008 11:36 AM
 Subject: Re: [CODE4LIB] place for code examples?

 I don't know if it's the best solution, but you could use the code4lib
 wiki if you like.  Won't have code formatting or
 anything like that.

 Incidentally, I'm interested in getting a DokuWiki installation going
 for code4lib, which I think will serve our needs somewhat better than
 the current MediaWiki.  But that goes back to the thread I introduced
 which died about how to grant shell access to code4libbers on the OSU
 hosted  Everyone seemed to agree that one or two or three
 code4libbers were neccesary to accept responsibility as app admin
 coordinator on the machine, but nobody actually volunteered to do
 so we're a bit stuck.  If we had a process/structure in place, and
 was an app you wanted installed on to do this, there might
 be a way to do that---depending on what process/structure we come up
 with. But without one...


 Keith Jenkins wrote:
  Does there already exist some place to put some code examples to
  with the code4lib community?  (I'm thinking of snippets somewhere on
  the order of 10-100 lines, like the definition of a php function.)

 Jonathan Rochkind
 Digital Services Software Engineer
 The Sheridan Libraries
 Johns Hopkins University
 rochkind (at)

[CODE4LIB] VuFind 0.8 Release

2008-03-18 Thread Andrew Nagy
Excuse the Cross Posting

Hello All - I am pleased to announce the latest release of VuFind - the open 
source library resource discovery platform.  Version 0.8 Beta is now available 
for download - you can access the download link from or from

The major enhancement in version 0.8 is our new MARC import tool developed by 
Wayne Graham.  This should help improve any issues dealing with importing 
records as well as a speed enhancement.

If you are interested in trying our vufind - have a look at our live demo:

Or feel free to join our mailing list:

Andrew Nagy

Re: [CODE4LIB] Planning open source Library system at Duke

2008-01-28 Thread Andrew Nagy
 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Nathan Vack

 Isn't there already an extant open-source ILS that's out there, and
 reputed to be rather good?

 I'm all for parallel approaches to problems... but the world of ILSes
 is pretty small. Maybe use fat cash from Mellon to help bake
 Evergreen the rest of the way?

Hear Hear!

Im sure our library would love to be apart of a grant where large sums of money 
get thrown at some of the existing open source ILSs to further the development 
in the areas that academic libraries need.  The last thing the library 
community needs is yet another planning group to analyze the next generation 
catalog or to survey the libraries to determine if they are happy or not.  I 
think Marshall Breeding and others' survey results are conclusive enough.  We 
all know we need something better - let's start working on it!


 On Jan 28, 2008, at 4:26 PM, John Little wrote:

  The Duke University Libraries are preparing a proposal for the Mellon
  Foundation to convene the academic library community to design an
  source Integrated Library System (ILS).  We are not focused on
  developing an
  actual system at this stage, but rather blue-skying on the elements
  academic libraries need in such a system and creating a blueprint.
  now, we are trying to spread the word about this project and find
  out if
  others are interested in the idea.

Re: [CODE4LIB] z39.50 holdings schema

2007-12-17 Thread Andrew Nagy
Emily - we are investingating NCIP quite a bit here for use with VuFind.  Maybe 
this might be an appropriate standard to standardize on?

Take care,

 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Emily Lynema
 Sent: Monday, December 17, 2007 9:42 AM
 Subject: [CODE4LIB] z39.50 holdings schema

 Anybody in this group have any experience using / implementing the
 z39.50 holdings schema?

 As part of the DLF ILS Discovery Interface Task Force, we are looking
 for a good schema to define holdings and item-related information (such
 as circulation status). While MARCXML is always an option for MARC
 holdings, I have the sense (aka, I know) that not all institutions /
 ILSs create MARC holdings for all records. So it would be nice to have
 schema into which it would be easy to translate either a MARC holdings
 record or just local holdings stored in some other way + circulation

 The rumor on the street is that z39.50 holdings schema is too complex
 and has never really been used. Anyone want to confirm or deny?

 I'm also interested in the up and coming ISO Holdings Schema (ISO
 that it sounds like has been motivated along by OCLC-PICA. But I don't
 have much information on that, so I'd be interested in hearing from
 anyone who knows more about that one, as well.

 Emily Lynema
 Systems Librarian for Digital Projects
 Information Technology, NCSU Libraries

Re: [CODE4LIB] [Fwd: z39.50 holdings schema]

2007-12-17 Thread Andrew Nagy
 It is also my understanding that while the Voyager NCIP API supports
 their ILL product, it was not meant to serve as a general purpose NCIP
 API.  I believe that that accounts for the lack of (customer)
 documentation.  Back in March of 2004, the then Endeavor Voyager
 Product Manager discussed their plans for further development of
 Voyager's NCIP API, and I don't think things have changed much since
 then [1].  If you've heard (or know) different, please let us (Voyager
 customers) know.  I've had my eye on NCIP as an API for quite some

Michael - thanks for the feedback.  I agree with everyone else that NCIP is not 
the killer app with ILS interoperability - but it's the closest thing we have 
at this moment.  What I am invisioning with VuFind is a base class that does 
NCIP functionality and then specialized classes for each ILS that tweaks the 
NCIP messages.

From what I have heard - Voyager 7 is supposed to have a much fuller NCIP 
implementation and I believe the same story for SirsiDynix.  But these are 
just that - stories.  Also I believe both Evergreen and Koha have NCIP as well.


[CODE4LIB] open source chat bots?

2007-12-03 Thread Andrew Nagy
Hello - there was quite a bit of talk about chat bots a year or 2 back.  I was 
wondering if anyone knew of an open source chat bot that works with jabber?


Re: [CODE4LIB] open source chat bots?

2007-12-03 Thread Andrew Nagy
Karen, we are building out a custom chat reference system with our new website 
redesign based on jabber.  Basically you will see all of the reference 
librarians who are logged in to the jabber server with a little picture/avatar 
along with their specialty areas.  The question is - who becomes the catch 
all - general reference librarian.  So we wanted to experiment with a chat bot 
and a reference script one of our reference librarians wrote up.  So if the 
student is totally clueless and doesn't know which librarian to pick - they can 
chat with a chat bot  or maybe we will hire Ms. Dewey!  Dunno if it will 
work out well - but something we want to play around with.  Then we could hook 
it up to our libstats implementation and automatically record all transactions. 
 An idea that we are just experimenting with at this stage.  I'll let you know 
when/if I get something up and running.


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 K.G. Schneider
 Sent: Monday, December 03, 2007 12:18 PM
 Subject: Re: [CODE4LIB] open source chat bots?

 On Mon, 3 Dec 2007 10:14:29 -0500, Andrew Nagy
  Hello - there was quite a bit of talk about chat bots a year or 2
  I was wondering if anyone knew of an open source chat bot that works

 I'm afraid this isn't an answer, but several times last week I almost
 posted a similar query to DIG_REF. I'm interested in this response and
 in any responses that would lead to a discussion of an OSS virtual
 reference solution with critical-path VR components such as multiple
 logins, statistics, transcripts, etc.

 Karen G. Schneider

Re: [CODE4LIB] open source chat bots?

2007-12-03 Thread Andrew Nagy
 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Wayne Graham
 Sent: Monday, December 03, 2007 12:47 PM
 Subject: Re: [CODE4LIB] open source chat bots?


 Not sure if this is what you're looking for, but in ColdFusion 7,

Stop right there, did you say coldfusion?  I think I just threw up in my mouth 
a little. :)

I would rather something available in java, c, c#, perl, php, etc.
I was thinking about making my own - but I have too much on my plate as is so I 
am looking to hack something in the open source market.


Re: [CODE4LIB] httpRequest javascript.... grrr

2007-11-29 Thread Andrew Nagy
Eric - Have a look at some of the ajax functions I wronte for VuFind - there 
are some almost identical function calls that work just fine.*checkout*/vufind/web/services/Record/ajax.js?revision=106
See function SaveTag

Also - You might want to consider using the Yahoo YUI Connection Manager or the 
Prototype AJAX toolkit.  They both work great and you don't need to spend time 
debugging.  I also find firebug (firefox plugin) to be an awesome ajax debugger.

Just by looking at your function real quick - you are calling 
httpRequest.send('') at the end of your function.  I think I read somewhere 
that you should send null and not an empty string.  Maybe that will solve it?  
Not really sure.


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Eric Lease Morgan
 Sent: Thursday, November 29, 2007 9:22 AM
 Subject: [CODE4LIB] httpRequest javascript grrr

 Why doesn't my httpRequest Javascript function return unless I add an
 alert? Grrr.

 I am writing my first AJAX-y function called add_tag. This is how it
 is suppose to work:

1. define a username
2. create an httpRequest object
3. define what it is suppose to happen when it gets a response
4. open a connection to the server
5. send the request

 When the response it is complete is simply echos the username. I know
 the remote CGI script works because the following URL works correctly:

 My Javascript is below, and it works IF I retain the alert
 ( 'Grrr!' ) line. Once I take the alert out of the picture I get a
 Javascript error xmldoc has no properties. Here's my code:

function add_tag() {

 // define username
 var username  = 'fkilgour';

 // create an httpRequest
 var httpRequest;
 if ( window.XMLHttpRequest ) { httpRequest = new XMLHttpRequest();
 else if ( window.ActiveXObject ) { httpRequest = new ActiveXObject
 ( Microsoft.XMLHTTP ); }

 // give the httpRequest some characteristics and send it off
 httpRequest.onreadystatechange = function() {

  if ( httpRequest.readyState == 4 ) {

   var xmldoc = httpRequest.responseXML;
   var root_node = xmldoc.getElementsByTagName( 'root' ).item( 0 );
   alert ( );


 }; 'GET', './index.cgi?cmd=add_tagusername=' +
 username, true );
 httpRequest.send( '' );
 alert ( 'Grrr!' );


 What am I doing wrong? Why do I seem to need a pause at the end of my
 add_tag function? I know the anonymous function -- function() -- is
 getting executed because I can insert other httpRequest.readyState
 checks into the function and they return. Grrr.

 Eric Lease Morgan
 University Libraries of Notre Dame

 (574) 631-8604

Re: [CODE4LIB] httpRequest javascript.... grrr

2007-11-29 Thread Andrew Nagy
Don't leave out the Yahoo YUI library as something to consider.  Whats nice is 
that you don't have to load the entire library as one big huge js file - you 
can pick and choose what libraries you want to include in your page minimizing 
the javascript filesize.  If you want to have one little js widget on you page 
- the browser doesn't need to download and process a 150kb prototype js file.


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Jonathan Rochkind
 Sent: Thursday, November 29, 2007 10:24 AM
 Subject: Re: [CODE4LIB] httpRequest javascript grrr

 These days I think jquery seems more generally popular than prototype.
 But both are options. I definitely would use one or the other, instead
 of doing it myself from scratch. They take care of a lot of weird
 cross-browser-compatibility stuff, among other conveniences.


 Jesse Prabawa wrote:
  Hi Eric,
  Have you considered using a Javascript Library to handle these
 details? I
  would recommend that you refactor your code to use one so that you
  concentrate on what you actually want to do instead. This way you can
  avoid having browser incompatabilities that are already solved if you
 use a
  Javascript Library. Try checking out Prototype at
  Best regards,
  On Nov 29, 2007 10:21 PM, Eric Lease Morgan [EMAIL PROTECTED] wrote:
  Why doesn't my httpRequest Javascript function return unless I add
  alert? Grrr.
  I am writing my first AJAX-y function called add_tag. This is how it
  is suppose to work:
1. define a username
2. create an httpRequest object
3. define what it is suppose to happen when it gets a response
4. open a connection to the server
5. send the request
  When the response it is complete is simply echos the username. I
  the remote CGI script works because the following URL works
  My Javascript is below, and it works IF I retain the alert
  ( 'Grrr!' ) line. Once I take the alert out of the picture I get a
  Javascript error xmldoc has no properties. Here's my code:
function add_tag() {
 // define username
 var username  = 'fkilgour';
 // create an httpRequest
 var httpRequest;
 if ( window.XMLHttpRequest ) { httpRequest = new
 XMLHttpRequest(); }
 else if ( window.ActiveXObject ) { httpRequest = new
  ( Microsoft.XMLHTTP ); }
 // give the httpRequest some characteristics and send it off
 httpRequest.onreadystatechange = function() {
  if ( httpRequest.readyState == 4 ) {
   var xmldoc = httpRequest.responseXML;
   var root_node = xmldoc.getElementsByTagName( 'root' ).item( 0
   alert ( );
 }; 'GET', './index.cgi?cmd=add_tagusername=' +
  username, true );
 httpRequest.send( '' );
 alert ( 'Grrr!' );
  What am I doing wrong? Why do I seem to need a pause at the end of
  add_tag function? I know the anonymous function -- function() -- is
  getting executed because I can insert other httpRequest.readyState
  checks into the function and they return. Grrr.
  Eric Lease Morgan
  University Libraries of Notre Dame
  (574) 631-8604

 Jonathan Rochkind
 Digital Services Software Engineer
 The Sheridan Libraries
 Johns Hopkins University
 rochkind (at)

[CODE4LIB] Access 2007 summary

2007-11-28 Thread Andrew Nagy
Does anyone know of or have an in-depth review of the access 2007 conference.  
Was there video captured?  I was unable to attend - but wanted to check it out 
this year.


[CODE4LIB] Position: Programmer at Villanova University Library

2007-11-06 Thread Andrew Nagy
Library Software Development Specialist
Falvey Library, Villanova University

This position reports to the Technology Management Team and is responsible for 
designing, developing, testing and deploying new technology methods, tools and 
resources to extend and enhance digitally-mediated or digitally-delivered 
library services, including but not limited to, Web interfaces, digital 
reference and research assistance, digitization and digital library 
development, institutional repository services, portalization and 
personalization of library resources, the integration of handheld devices into 
the library service environment, Web content management, collaboration 
software, staff Intranet services, online knowledge base development, and 
related areas.  This person will also serve as trainer and mentor to librarians 
and other library staff involved in new technology initiatives, with an 
emphasis on skill transfer, skill development, and the expansion of the 
library's technology base in support of continuously improving digital services 
for library users.

Requirements include:  Bachelor's degree in computer science, information 
systems or a related field required; 1 year of professional experience 
developing and implementing technology projects in a collaborative, team-based, 
goal-oriented environment; ability to work independently on programming and 
technology implementation projects; ability to listen to and act upon the needs 
and suggestions of others, in support of user-oriented systems design and 
development; excellent analytical skills to support problem solving, systems 
analysis, software functional specification, and debugging; ability to juggle 
multiple competing priorities; excellent writing skills for the preparation of 
clear, user-oriented documentation; capacity for higher-level strategic 
analysis of technology trends; working knowledge of PC and Unix-based computing 
platforms and operating systems; working knowledge of web development tools and 
technologies, including PHP, ASP, .Net, Java, HTML and CSS, AJAX, XML, XSLT and 
XQuery; working knowledge of Unix server administration and related scripting 
languages; working knowledge of SQL, database systems, and basic principles of 
database design.

You may email resumes, but please include a cover letter, resume and references 
in only one attachment.  Please submit resumes to [EMAIL 
PROTECTED]mailto:[EMAIL PROTECTED], or fax to (610) 519-6667.  Please send 
only one resume.

For further information, call Barbara Kearns at ext. 9-4235 or the Villanova 
Job Hotline at (610) 519-5900

Re: [CODE4LIB] Libstats is looking for project leaders

2007-10-26 Thread Andrew Nagy
Nate, we use LibStats religiously here.  I would be interested in joining the 
community - but similiarly to you, I don't have much time to spare.


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Nathan Vack
 Sent: Friday, October 26, 2007 12:42 PM
 Subject: [CODE4LIB] Libstats is looking for project leaders

 Hi all,

 I was recently involved in a discussion about the mechanics of
 running an open-source project over at Library Web Chic, and I've
 come to the conclusion that for a project to succeed, it really needs
 to have at least a small, dedicated community. A community of one is
 no community at all ;-)

 For the last few years, I've been in charge of running Libstats, a
 small, GPL'd reference statistics tracking / knowledgebase project.
 For a variety of reasons*, I'm unlikely to have a significant amount
 of time to devote to the project ever again... and there are a lot of
 things that could use improvement, ranging from squashing bugs to
 improving documentation to adding features to answering support

 So... here's my call for volunteers. This project is quite small
 (6400 LoC), PHP / MySQL-based, and seems to work pretty well for the
 majority of its users -- it'd be a great place for someone new to
 open-source project management to learn the ropes. I'd especially
 like someone outside our university to have some ownership of the

 Interested? Head over to --
 that's where the party's at.

 -Nate Vack
 Wendt Library
 University of Wisconsin - Madison

 * Full disclosure: I'm also working on a hosted, closed-source
 competitor to this project... so for me to stay solely in charge of
 Libstats would be conflict-of-interest-central. That's not my only
 reason, but it's a big one.

Re: [CODE4LIB] LC class scheme in XML or spreadsheet?

2007-09-25 Thread Andrew Nagy
This topic came up a few weeks ago on code4lib too, where were you Ed!? :)

I will echo something that Roy mentioned in the thread from a few weeks back, 
would the LOC be willing to create a web service where you could supply a call 
number and it would return the heirarchy of topic areas for that number?


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Ed Summers
 Sent: Monday, September 24, 2007 8:19 PM
 Subject: Re: [CODE4LIB] LC class scheme in XML or spreadsheet?

 It's funny this subject just came up on one of the open-library
 discussion lists this week [1]. A whiles ago now Rob Sanderson, Brian
 Rhea (University of Liverpool) and I pulled down the LC Classification
 Outline pdf files, converted them to text, wrote a python munger to
 convert the text into what ended up being a SKOS RDF file. We made the
 code available [2] and you can see the resulting SKOS (which needs
 some URI work) [3].

 It's kind of a work in progress (still). I wanted to get to the point
 that the rdf file was leveraged in a little python library (possibly
 as a pickled data structure) for easily validating LC numbers and
 looking them up in the outline.

 I'd be interested in any feedback.



[CODE4LIB] LCC classifications in XML

2007-08-28 Thread Andrew Nagy
Does anyone know of a place where the LCC Callnumber classifications can be 
found in a parseable format such as XML?


Re: [CODE4LIB] LCC classifications in XML

2007-08-28 Thread Andrew Nagy
Yes Please, Is Ed listening in?


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Jonathan Brinley
 Sent: Tuesday, August 28, 2007 3:36 PM
 Subject: Re: [CODE4LIB] LCC classifications in XML

 Not long ago, I recall Ed Summers sharing the classification outline
 in RDF. I may still have a copy of that around if you're interest.

 Have a nice day,

  On 8/28/07 12:16 PM, Andrew Nagy [EMAIL PROTECTED] wrote:
   Does anyone know of a place where the LCC Callnumber
 classifications can be
   found in a parseable format such as XML?

 Jonathan M. Brinley



2007-08-13 Thread Andrew Nagy
 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Will Kurt

 One of the things that's really lacking in the library community is
 something like a to serve as a central repository for
 all opensource library projects and this certainly sounds like a step
 in the right direction (maybe there already is such a thing and I
 don't know about it).  I'm sure many people out there have at least
 snippets of code or various libraries that they might not know where
 to publish or are already publishing but other people don't know
 where to find them.

I totally agree.  I had always wished to have a place on code4lib for people to 
share snippets of code.  A marc library, or an xslt doc, etc.

The code that runs the repository site is open source.  I think it 
would be neat to have a code repository like pear/cpan where we can all share 
code snippets and documentation for the code.


Re: [CODE4LIB] hosting

2007-07-30 Thread Andrew Nagy
In case I can't make the conversation, I must suggest Bastille - a linux 
package that does firewalling and IP Masquerading.  I have been using it for 
about 8 years now and have never had a hacked linux box running it.

I even had my ISP kill my network connection once because my server was being 
attacked by thousands of machines and never once got through and the machine 
never experienced any performance degredation.

Good luck

 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Ed Summers
 Sent: Friday, July 27, 2007 5:18 PM
 Subject: [CODE4LIB] hosting

 As you may have seen or experienced is down for the count
 at the moment because of some hackers^w crackers who compromised anvil
 and defaced various web content and otherwise messed with the
 operating system. anvil is a machine that several people in the
 code4lib community run and pay for themselves.

 Given that code4lib has grown into a serious little gathering, with
 lots of effort being expended by the likes of Jeremy Frumkin and Brad
 LaJenuesse to make things happen -- it seems a shame to let this sort
 of thing happen. We don't have any evidence, but it seems that the
 entry point was the fact that various software packages weren't kept
 up to date.

 Anyhow, this is a long way of inviting you to a discussion Aug 1st
 @7PM GMT in irc:// to see what steps need to
 be taken to help prevent this from happening in the future.
 Specifically we're going to be talking about moving some of the web
 applications to institutions that are better set up to manage them.

 If this interests you at all try to attend!


Re: [CODE4LIB] parse an OAI-PHM response

2007-07-30 Thread Andrew Nagy
Andrew, I began building a PHP OAI Client library based on a OAI Server library 
that I wrote a while back.  The OAI Client library is not complete, but it can 
get you started.  I attached it in a file called Harvester.php


 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Andrew Hankinson
 Sent: Friday, July 27, 2007 9:32 PM
 Subject: [CODE4LIB] parse an OAI-PHM response

 Hi folks,
 I'm wanting to implement a PHP parser for an OAI-PMH response from our
 Dspace installation.  I'm a bit stuck on one point: how do I get the
 script to send a request to the OAI-PMH server, and get the XML
 response in
 return so I can then parse it?

 Any thoughts or pointers would be appreciated!


Description: Harvester.php

Re: [CODE4LIB] marc2oai

2007-05-29 Thread Andrew Nagy
 -Original Message-
 From: Code for Libraries [mailto:[EMAIL PROTECTED] On Behalf Of
 Eric Lease Morgan
 Sent: Tuesday, May 29, 2007 1:53 PM
 Subject: [CODE4LIB] marc2oai

 Does anybody here know of a MARC2OAI program?

Eric, I have a small script that does this, it is fairly quite simple.  
Probably about 100 lines of code or so.

I have a nightly cron script that gets any new/modified marc records from the 
past 24 hours out of the catalog and then runs marc2xml on the dump file.  Then 
I have a small script that breaks up the large marcxml files into individual 
xml files and imports them into SOLR!  I then can use an XSL stylesheet such as 
the LOC's marc2oai to produce an OAI document or the marc2rdf, etc on the full 
marcxml files (since solr doesn't have the original record).  I have yet to 
incorporate my OAI server code into this, but since it is already written, it 
would be a fairly easy merge.

This is all built into my NextGen OPAC that I am working on and hope to 
open-source sometime this summer.  So sorry, im not allowed to hand out the 
code just yet :(


[CODE4LIB] Posting Presenations

2007-03-07 Thread Andrew Nagy

I am still having difficulty posting my presentation to the C4L
website.  I am getting an error about my file not being authorized or
something to that extent.  I did not try last night, but I will try
again tonight.

Has anyone checked to make sure that this is working?


Re: [CODE4LIB] Preconference

2007-02-22 Thread Andrew Nagy

You can find my schema file to match the XSLT doc at:


Emily Lynema wrote:

Hi Andrew,

I was thinking about using your marcxml2solr.xsl to quickly transform
my marcxml to solr input for testing. Do you have a solr schema file
as well that could be used to jumpstart the system?


Andrew Nagy wrote:

Andrew Nagy wrote:

I have an XSLT doc for transforming MARCXML to SOLR XML that I can
share around.

I was asked if I could post my XSLT doc, so here it is!

It is probably somewhat geared toward my collection of data and I had
some custom scripting for determining the format more accurately but I
removed it to for compatibility reasons.  This will give you a chance to
play with some data before the preconference.


Re: [CODE4LIB] Preconference

2007-02-13 Thread Andrew Nagy

Andrew Nagy wrote:

I have an XSLT doc for transforming MARCXML to SOLR XML that I can
share around.

I was asked if I could post my XSLT doc, so here it is!

It is probably somewhat geared toward my collection of data and I had
some custom scripting for determining the format more accurately but I
removed it to for compatibility reasons.  This will give you a chance to
play with some data before the preconference.


Re: [CODE4LIB] Very large file uploads, PHP or possibly Perl

2007-02-09 Thread Andrew Nagy

I have done large file uploads in PHP.  Make sure you have the following
set in php.ini:

upload_max_filesize = some large size followed by M for megabyte or G
for gigabyte
file_uploads = on
post_max_size = some large size

Also, you can set these values through the set_ini function in PHP so
that it can be per script instead of effective for every script which
can allow for a more granular level of control for security reasons, etc.

I have never used the form input value, nor should you have to change
the memory_limit very much since the file itself is not loaded into
memory, just information regarding the file.


Thomas Dowling wrote:

I have always depended on the kindness of strange PHP gurus.

I am trying to rewrite a perpetually buggy system for uploading large
PDF files (up to multiple tens of megabytes) via a web form.  File
uploads are very simple in PHP, but there's a default maximum file size
of 2MB.  Following various online hints I've found, I've gone into
php.ini and goosed up the memory_limit, post_max_size, and
upload_max_size (and restarted Apache), and added an appropriate hidden
form input named MAX_FILE_SIZE.  The 2MB limit is still in place.

Is there something I overlooked?  Or, any other suggestions for how to
take in a very large file?

[My current Perl version has a history of getting incomplete files in a
non-negligible percentage of uploads.  Weirdness ensues: whenever this
happens, the file reliably cuts off at the same point, but the cutoff is
not a fixed number of bytes, nor is it related to the size of the file.]

Thomas Dowling

Re: [CODE4LIB] a few code4lib conference updates

2007-01-19 Thread Andrew Nagy

Nathan Vack wrote:

On Jan 19, 2007, at 9:51 AM, LaJeunesse, Brad wrote:

I must strongly encourage everyone attending to bring
fully-charged laptops and spare batteries (if you have them). The
auditorium has 60 power outlets available, which gives us roughly a
ratio of outlets to people.

Spare batteries rather expensive... but power strips are dead cheap.

Doesn't everyone travel with powerstrips in their laptop bag?

Maybe we could have some wireless power stations?


Re: [CODE4LIB] Getting data from Voyager into XML?

2007-01-17 Thread Andrew Nagy

Nathan Vack wrote:

Hey cats,

I'm starting to think (very excitedly) about the Lucene session, and
realized that I'd better get our data into an XML form, so I can do
interesting things with it.

Anyone here have experience (or code I could steal) dumping data from
Voyager into... anything? I'm happy working in PHP, Java, Ruby, or
perl -- though happiest, probably, in Ruby.

Nate, it's pretty easy.  Once you dump your records into a giant marc
file, you can run marc2xml
(  Then run an
XSLT against the marcxml file to create your SOLR xml docs.

One thing I am hoping that can come out of the preconference is a
standard XSLT doc.  I sat down with my metadata librarian to develop our
XSLT doc -- determining what fields are to be searchable what fields
should be left out to help speed up results, etc.

It's pretty easy, I think you will be amazed how fast you can have a
functioning system with very little effort.


Re: [CODE4LIB] Getting data from Voyager into XML?

2007-01-17 Thread Andrew Nagy

Bess Sadler wrote:

As long as we're on the subject, does anyone want to share strategies
for syncing circulation data? It sounds like we're all talking about
the parallel systems á la NCSU's Endeca system, which I think is a
great idea. It's the circ data that keeps nagging at me, though. Is
there an elegant way to use your fancy new faceted browser to search
against circ data w/out re-dumping the whole thing every night?

I will talk about this in my presenation at the conference.
Syncing every night is too infrequent if you ask me.  I considered
syncing like every 15 mintues, until I stepped back and looked at that
idea from a reality concept and laughed at myself.

Our system (going into beta next week!) is using realtime SQL calls for
location, status, etc. to our Voyager DB.


Re: [CODE4LIB] Getting data from Voyager into XML?

2007-01-17 Thread Andrew Nagy

Nathan Vack wrote:

Unless I'm totally, hugely mistaken, MARC doesn't say anything about
holdings data, right? If I want to facet on that, would it make more
sense to add holdings data to the MARC XML data, or keep separate xml
files for holdings that reference the item data?

As others have said, you can get *some* holding data in a marcxml file,
but nothing that will help you.  Especially the holding data could
change at a moments notice.  You will have to get access to your
holdings data some other way on a real-time (or 15 - 30 minute) delay.


Re: [CODE4LIB] lucene pre-conference - reminder

2006-12-19 Thread Andrew Nagy

Bess, do you have a set time for the pre-conference?  I need to change
my air flight reservations so I can make it.


Bess Sadler wrote:

Hey, code4libbers,

If you are attending code4lib con 2007, you might also want to attend
the one day pre-conference workshop about lucene and solr (and how to
use them to index / search / browse library collections). It will be
taught by the incomparable Erik Hatcher (author of _Java Development
with Ant_ and _Lucene in Action_). Registration is free, but seats
are limited, so if you want to attend please make sure to reserve a
spot. Registration consists of sending me an email and telling me you
plan to attend.

The following list are the people who have registered. If you're not
on this list, then I haven't reserved you a spot. Please let me know
asap if you plan to come so we can plan our seating and space needs.


Bess Sadler

People who have registered for the pre-conference:
Adam Soroka
Andrea Goethals
Andrew Darby
Andrew Nagy
Antonio Barrera
Art Rhyno
Bess Sadler
Dan Scott
Ed Summers
Edwin Sperr
Emily Lynema
Jonathan Gorman
Jonathan Rochkind
Kevin S. Clarke
Kristina Long
Michael Doran
Michael Witt
Mike Beccaria
Parmit Chilana
Peter Binkley
Ross Singer
Spencer McEwen
Steve Toub
Tito Sierra
Tom Keays
Winona Salesky

Elizabeth (Bess) Sadler
Head, Technical and Metadata Services
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904

(434) 243-2305

Re: [CODE4LIB] code4lib lucene pre-conference

2006-12-13 Thread Andrew Nagy

Erik Hatcher wrote:

At this point, I'm planning on winging it with the datasets.  By late
February I will have (high on my TODO list now!) built a light-weight
Solr mechanism for bringing in MARC data, and perhaps more (iTunes
data files would make a fun one) and doing simple skinnable front-
ends on Solr.  Rails at least, but also demo the various formats that
Solr can output making it pluggable into whatever environment easily.

Erik, here is an XSLT doc I created for transferring MARCXML to SOLR
XML.  It has some PHP components in it that just make some of the ugly
marc data into something more friendly.  It also has some logic based on
our data, but is fairly generic.

I was hoping that during the preconference we could all discuss this
transformation process.  I have been working with our metadata librarian
on determining which fields should be included and which should be
grouped together for indexing and searching processes.  However, someone
out there might have some better ideas as how to be to transform the
data into SOLR.

?xml version=1.0 encoding=utf-8?
xsl:stylesheet version=1.0

  xsl:output method=xml indent=yes encoding=utf-8/

  xsl:template match=/
  xsl:call-template name=record/

  xsl:template name=record
xsl:for-each select=//record
  field name=idxsl:value-of select=[EMAIL PROTECTED]//field
  field name=formatxsl:value-of select=php:functionString('getFormat', ./leader, ./[EMAIL PROTECTED])//field
  field name=languagexsl:value-of select=substring(./[EMAIL PROTECTED], 36, 3)//field

  field name=isbnxsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

  field name=issnxsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

xsl:when test=[EMAIL PROTECTED]
  field name=callnumberxsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']/xsl:value-of select=[EMAIL PROTECTED]'090']/[EMAIL PROTECTED]'b']//field
  xsl:if test=[EMAIL PROTECTED]
field name=callnumberxsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']/xsl:value-of select=[EMAIL PROTECTED]'050']/[EMAIL PROTECTED]'b']//field

  field name=authorxsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

  field name=authorxsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

  field name=authorxsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

  field name=authorxsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

  field name=titlexsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']/ xsl:value-of select=[EMAIL PROTECTED]'245']/[EMAIL PROTECTED]'b']//field

  field name=title2xsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

  field name=publishDatexsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'c']//field

  field name=dateSpanxsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

  field name=seriesxsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

  field name=seriesxsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

  xsl:call-template name=subjects/

  xsl:for-each select=[EMAIL PROTECTED]
field name=Author2xsl:value-of select=./[EMAIL PROTECTED]'a']//field

field name=oldTitlexsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

field name=newTitlexsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

field name=seriesxsl:value-of select=[EMAIL PROTECTED]/[EMAIL PROTECTED]'a']//field

field name=urlxsl:value-of select=[EMAIL 

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-29 Thread Andrew Nagy

Clay Redding wrote:

Hi Andrew (or anyone else that cares to answer),

I've missed out on hearing about incompatabilites between MARCXML and
NXDBs.   Can you explain?  Is this just eXist and Sleepycat, or are
there others?  I seem to recall putting a few records in X-Hive with no
problems, but I didn't put it through any paces.

Yes, I have only done my testing with eXist and Sleepycat, but I also
have an implementation of MarkLogic that I would like to test out.  I
imagine though that all NXDBs will have the same problem.  This is the
heart of my proposed talk.  It has to do with the layout of marcxml.
Adding a few records to any NXDB will work like a charm, do your testing
with 250,000+ records and then you will begin to see the true spirit of
your NXDB.

Also, if there was a cure to the problems with MARCXML (I'm sure we can
all think of some), what would you suggest to help alleviate the

Sure, I know of a cure!  I have come up with a modified marcxml schema,
but as I am investigating SOLR further, I think the solr schema is also
a cure.

The problem with MARXML is the fact that all of the elements have the
same name and then use the attributes to differentiate them, (excuse my
while I barf) this makes indexing at the XML level very difficult,
especially for NXDBs.  I got a concurring agreement from main developers
of both packages (exist, berkeley) in this front.  My schema just puts
all of the marc fields into it's own element.  Instead of datafield
code=245, I created a field called T245 and instead of all of the
subfields in multiple tags, i just put all of the subfields into one
element.  No one needs to search (from my perspective) the subtitle
(b) separately from the main (a) title, so I just made a really
simple xml document that is 1/4 the size.  By doing this I was able to
take a 45 minute search of marcxml records and reduce it down to results
in 1 second.  The main boost was not the reduction in file size, but the
way the indexing works.

Give it a shot, I promise better results!


Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-29 Thread Andrew Nagy

Kevin S. Clarke wrote:

Fwiw Andrew, I'd suggest you are not seeing the true spirit of your
NXDB.  Try to put MARC into a RDBMS and you are going to run into the
same problem.  You have to index intelligently or reorganize the data
(which is the default when you put XML into a RDBMS anyway).  Perhaps
a criticism of NXDBs could be that they make sound like they can
handle anything you throw at them without regard for what that is...
If it is XML, we can handle it.

I agree, and that is why I have refactored the marcxml into a format
that I feel an NXDB can handle.  They cannot handle any XML format, and
I have heard confessions from the developers of these systems about this
point exactly.  It seems that we can all agree that both marc and
marcxml are bad formats!

Data can have a structure that makes it more accessible or less.  The
promise of XML (as a storage format rather than transmission format
(which is its other purpose)) is that you can work with data in its
native format (no deconstruction necessary).  However, there is
nothing about XML or NXDBs that makes one use a well structured data

No, you are right.  NXDB's are too dumb to determine if your XML format
is going to work or not.  But the wonders of XSLT make it simple to
transform to another modified format that an NXDB can handle well.

So ... while we are on this topic.  You wouldn't want to index marcxml
records in lucene, you would use marc21, right?  Why deal with the
overhead of xml if it is not necessary.  We have to format our data no
matter what for to best fit our storage/search system.


Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Andrew Nagy

Erik Hatcher wrote:

What if games are mostly just guessing games in the high tech
world.  Agility is the trait our projects need.  Software is just
that... soft.  And malleable.  Sure, we can code ourselves into a
corner, but generally we can code ourselves right back out of it
too.  If software is built with decent separation of concerns, we can
adapt to changes readily.

I completely agree, but you can't deny it's a valid concern.  I am
always thinking about the future and making sure my software is modular
and flexible so any part can easily be replaced.  So I would hope it's
as easy as just writing a new driver for a new system that you want to
replace with.

Anyway, you have all convinced me to give solr a whirl ... im
downloading it right now.


Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Andrew Nagy

Art Rhyno wrote:

I made a big mistake along the way in trying to work with Voyager's call
number setup in Oracle, and dragged Ross along in an attempt to get past
Oracle's constant quibbles with rogue characters in call number ranges.
The idea was to expose the library catalogue as a series of folders using
said call number ranges. This part works well enough when the characters
are dealt with, but breaks down a bit for certain formats. For example,
the University of Windsor lumps most of its microfiche holdings in one
call number with an accession number, and Georgia Tech does something
similar with maps. This can mean individual webdav folders with many
thousands of entries, and some less than elegant workarounds.

So you are replacing SQL calls with WebDAV?  Can you explain this a bit


Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Andrew Nagy

Kevin S. Clarke wrote:

By the way, I see a very interesting intersection between Solr and
XQuery because both are speaking XML.  You may have XQueries that
generate the XML that makes Solr do it's magic for instance.  This is
an alternative to fulltext in XQuery, sure... it is something that is
here today (doesn't mean I'll stop thinking about tomorrow though).

There is a good intersection, but if you look at the roadmap for eXist
(native xml database) they have many of the features that solr offers
(im still in the process of setting up solr so I am not too indepth with
the features yet).  eXist is basically an attempt at this intersection.
Too bad it's just too damn slow and still in it's infancy stages.


Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Andrew Nagy

Casey Durfee wrote:

I thought that was the point of using interfaces?  I guess I don't get why you 
need a standard to be compelled to do something you should be doing anyway -- 
coding to interfaces, not implementations.

Interfaces work well with like products (a database abstraction library
is a great example), however interfaces don't lend well to products that
achieve a similar goal but work differently altogether.  Relational
databases all work the same: there are databases, each database has
tables, views, procedures, etc. and each table has columns, etc.
However more infantile systems such as xml storage systems are hard to
map in a similar fashion.  I ran into this exact problem, I developed a
system around eXist and developed an interface for the data layer and a
driver for interacting with exist.  I then wanted to compare other
databases such as berkeley db xml.  I quickly found that they achieve a
common goal, but do not implement the same concepts making them very
hard to compare.  eXist has collections to group your xml into
distinct groupings and db xml does not.  In my interface I had a method
called getCollections, but since db xml does not have anything like
this, I could not use that method.  So now how would you develop an
interface that would include various xml databases as well as full-text
index systems such as lucene, etc.  I would image this would be very

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Andrew Nagy

Kevin S. Clarke wrote:

Have you had a chance yet to evaluate the 1.1 development line?  It is
supposed to have solved the scaling issues.  I haven't tried it myself
(and remain skeptical that it can scale up to the level that we talk
about with Lucene (but, as you point out, it is trying to do more than
Lucene too)).

I gave the 1.1 line a shot, but still saw abysmal results ... I sent
Wolfgang (the lead guy) my marcxml records and he implemented it in my
development environment and found the same issues.  The major problem
with it all is the ugly mess that is marcxml and it's incompatability
with native xml dbs.  Although, I still have some ideas that I have not
had a chance to test yet under the 1.1 branch.

I just finished coding our beta OPAC, so I am now heading back into my
load  scalability testing.  I am using Berkely DB XML which beats the
pants off of eXist in performance but has no where the feature set of
eXist.  I plan to re-test eXist 1.1 on my production server so I can get
a better handle on the speeds on a machine with a bit more beef.

I am also going to give this Nux a shot too.  Anyone out there using it?

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Andrew Nagy

Bess Sadler wrote:

Enough people are interested in ILS related topics that it might be
worth forming groups around specific ILS products. If you are one of
these people, email the list if you're interested in setting up such
a thing.

Bess, this sounds like a great conversation.  You can count me in.
Could you please describe the time for when this might occur as I have
already booked my flight into Atlanta for late in the afternoon so I
would need to change that if you plan on having the session earlier in
the day.

We just last week finished up the beta release of our new OPAC that is
based on a native XML Database based with modified MARCXML records, but
have been somewhat disappointed with the performance of the XML Database
search times.  I have been considering looking at other options such as
lucene based products (XTF, etc).   This would be a great topic for me.


Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Andrew Nagy

Bess Sadler wrote:

Hi, Andrew. Since this will be an all-day event, the session would be
starting first thing in the morning on Feb 27. I'm thinking 9am, but
I haven't confirmed that with anyone else. I'm just flying by the
seat of my pants here.

I wouldn't be able to make this then due to time constraints.

That way you can use solr / lucene for search, faceted
browse, etc, and your XML database only for known item retrieval,
which it is generally able to do without performance issues.

I am doing something similar except I am using my file system as my
database for pulling the full marcxml records.  This offers little
overhead as possible.  Now think about the possibilities of using
something like lucene or postgres as your filesystem.  There are many
groups working on these such filesystems for years.

hopping up and down waiting for someone to take this approach with an
ILS, so please come and show us what you've got!

I have proposed a talk on my trials and tribulations of developing this
at this years code4lib conference.  If it is accepted I will share all
the gory details.

BTW, have you played with Hadoop?  I guess it's something like the
open-source attempt to google's search algorithm.  I would be curious
about implementing hadoop across a few servers to store the marcxml records.


Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-27 Thread Andrew Nagy

Binkley, Peter wrote:

There would probably be a lot of optimizations you could do within Solr
to help with this kind of thing. Art and I talked a little about this at
the ILS symposium: why not nestle the XML db inside Solr alongside
Lucene? Solr could then manage the indexing of the contents of the db,
and augment your search results with data from the db: you could get
full records as part of your search results without having to store them
in the Lucene index.

At this point, why use a DB?  Just store your records in your server
file system.  It's fast and less applications to worry about
maintaining.  If your search matches 5 records, just open those 5 files
on your server.

Good conversations ... getting excited for the conference already!


[CODE4LIB] Announcing the Villanova University Digital Library

2006-09-06 Thread Andrew Nagy

The staff of Falvey Memorial Library proudly announces the grand opening
of the Villanova University Digital Library.

The Digital Library is a repository of many digitized items from our
Special Collections as well as other donated items and partnering
institutions. The repository was developed by library staff and built
from an open source platform. The repository uses a native XML database,
eXist, to store and organize our digital objects encoded in the METS
format. The web site allows for users to search and view all of the
items stored in the repository by using many of the wonderful XML
technologies such as XQuery and XSLT.

Noteworthy initial digital collections include: the complete collection
of Cuala Press Broadsides, notable as a primary source for many folk
songs and for the illustrations of Jack Yeats – brother of the Poet
laureate; a signed and edited copy of Memoranda During the War by Walt
Whitman; personal letters and books from the Joseph McGarrity Collection
dealing with Irish and Irish-American History, an illuminated manuscript
of selections from the Holy Koran, and plenty more! We will be
constantly adding more and more items, so please check back often.

Feel free to browse our collections and enjoy the wonderful images:

Andrew Nagy

Re: [CODE4LIB] Catalog Enhancements Extensions (Re: mylibrary @ockham)

2005-10-28 Thread Andrew Nagy

Wow!  Thanks for such a detailed reply ... this is awesome.

I am thinking about storing the data from the catalog in an XML database
as well, however since I know very little about these I am greatly
concerned about the scalability ... can they handle the 800,000+ records
we have in our catalog?  If I am just using it as a store, and then use
some sort of indexer, this shouldn't be a concern?

Lucene seems enticing over Zebra since it is a z39 interface which from
what I can understand will not let me do fancy searches such as what was
recently cataloged in the past 7 days, etc.
What about Xapian or XTF, did you test these out at all?  I guess lucene
seems like a better product because it is an apache project?

Thanks for all the Info!


Ross Singer wrote:

This is pretty similar to the project that Art Rhyno and I have been
working on for a couple of months now.  Thankfully, I just got the
go-ahead to make it the top development priority, so hopefully we'll
actually have something to see in the near future.  Like Eric, we don't
have any problem with (and there aren't touching) any of the backend
stuff (cataloging, acq, circ), but have major issues with the public

Although the way we're extracting records from our catalog is a little
different (and there are reasons for it), the way I would recommend
getting the data out of the opac is not via z39.50, but through
whatever sort of marcdump utility your ILS has.  You can then use
marc4j (or something similar) to transform the marc to xml (we're going
to MODS, for example).  Although we're currently just writing this dump
to a filesystem (broken up by LCC... again, there are reasons that
don't exactly apply to this project), but I anticipate this will
eventually go into a METS record and a Berkeley xmldb for storage.  For
indexing, we're using Lucene (Art is accessing it via Cocoon, I am
through PyLucene) and we're, so far, pretty happy with the results.

If Lucene has issues, we'll look at Zebra (as John mentioned), although
Zebra's indexes are enormous.  The nice thing about Zebra, though, is
that it would forgo the need for the Berkeley DB, since it stores the
XML record.  The built-in Z39.50 server is a nice bonus, as well.
Backups would be XTF ( and
Xapian.  Swish-e isn't really an option since it can't index utf-8.

The idea then is to be able to make stronger relationships between our
site's content... eliminate the silos.  A search that brings back a
couple of items that are in a particular subject guide would get a link
to the subject... or at least links to the other top items from that
guide (good tie in with MyLibrary, Eric).  Something that's on reserve
would have links to reserve policies or a course guide for that course
or whatever.

Journals would have links to the databases they are indexed in.

Yes, there's some infrastructure that needs to be worked out... :)

But the goal is to have something to at least see by the end of the
year (calendar, not school).

We'll see :)


On Oct 27, 2005, at 5:58 PM, Eric Lease Morgan wrote:

On Oct 27, 2005, at 2:06 PM, Andrew Nagy wrote:

I have been thinking of ways, similiar to what you have done that you
mentioned below with the Ockham project, to allow more modern day
with our library catalog.  I have been beginning to think about
a way to index/harvest our entire catalog (and allow this indexing
process to run every so often) to allow our own custom access methods.
We could then generate our own custom RSS feeds of new books, allow
efficient/enticing search interfaces, etc.

Do you know of any existing software for indexing or harvesting a
catalog into another datastore (SQL Database, XML Database, etc).
I am
sure I could fetch all of the records somehow through Z39.50 and
dump it
into a MySQL database, but maybe there is some better method?

I too have thought about harvesting content from my local catalog and
providing new interfaces to the content, and I might go about this in
a number of different ways.

1. I might use OAI to harvest the content, cache is locally, and
provide services against the cache. This cache might be saved on a
file system, but more likely into a relational database.

2. I might simply dump all the MARC records from my catalog,
transform them into something more readable, say sets of HTML/XML
records, and provide services against these files.

The weakest link in my chain would be my indexer. Relational
databases are notoriously ill-equipped to handle free text searching.
Yes, you can implement it and you can use various database-specific
features to implement free text searching, but they still won't work
as well as an indexer. My only experience with indexers lies in
things like swish-e and Plucene. I sincerely wonder whether or not
these indexers would be up to the task.

Supposing I could find/use an indexer that was satisfactory, I would

Re: [CODE4LIB] spelling server

2005-09-13 Thread Andrew Nagy

Seems like an awful lot of extra overhead, whats the need for a sever
instead of having the aspell installed along with the application?  I
always use this for all of my search applications and works well and is
really fast.  I guess I don't understand the need for a webservice?

Cool none the less :)


Eric Lease Morgan wrote:

What do y'all think of the idea of a spelling server -- a Web service
taking a word as input and returning a list of alternative spellings.

[EMAIL PROTECTED] has indexed about 430,000 OAI records. These records
have grossly classified into a number of domains such as mathematics,
life science, theses  dissertations, and a master domain consisting
of all the sub domains.

Taking a hint from Bill Mosely (of swish-e fame), I have read the
indexes, parsed out the individual words, and fed them to GNU ASPELL,
a dictionary program. It is then possible to query ASPELL and have it
return alternative spellings. We have incorporated this feature into

I could make this spell checking functionality available as a Web
service. The URL could look something like this:

The output could look something like this:

?xml version='1.0'?

It would then be up to the client to do with the content of the
spelling elements as they desired. For example, the client could:

  * spell check a document
  * implement a Did You Mean? service a la Google
  * incorporate the results into a Find More Like This One search
  * enhance the results of an OPAC search
  * feed selected words back to the spelling server

Alternative URL's might include:

Writing the underlying script would be easy. Articulating a XML
stream as output would be harder.

What do y'all thinque? It would be fun at the very least.

Eric Lease Morgan
University Libraries of Notre Dame

(574) 631-8604

Re: [CODE4LIB] ajax

2005-06-10 Thread Andrew Nagy

The fact that the XMLHttpRequest is a de facto standard and not an
actual standard worries me though, with out being accepted by the w3c,
it seems like a very volatile technology.  But if google is using it, i
guess we are safe. :)

I'd be interested to see some examples if you create any.


Eric Lease Morgan wrote:

Ajax is a thing I'd like to play with more:

By exploiting a Javascript function called XMLHttpRequest it is
possible to create Web pages that seem more like desktop applications.
By not forcing the user to go from page to page to page it is possible
to keep the attention of users longer as well as provide a more
interactive experience. The link above describes this in more detail
and points to a number of Javascript libraries for a number of
languages enabling you to write such applications more easily.


Eric Morgan

Re: [CODE4LIB] find more like this one

2005-05-24 Thread Andrew Nagy

Binkley, Peter wrote:

Bear in mind that even in UTF-8 there is more than one way to encode an
accented character. It can be precomposed (using a single character,
e.g. U0089 for lower-case e-acute: this is normalization form C) or
decomposed (using a base character and a non-spacing diacritic, e.g.
U0065 and U0301, lower-case e plus the acute accent: this is
normalization form D). If you're searching at the byte level, you have
to be sure that your index and your search term have been normalized the
same way or they won't match. I've found this FAQ useful for this stuff: In a Java context,
we've used ICU4J ( to normalize stuff
(including stripping accents and normalizing case for different scripts)
for indexing and searching in UTF-8. There's also a C API, which could
presumably be incorporated into a Perl process, but no doubt there are
similar native Perl tools.

In general I think we've got to include i18n from the beginning: pay
attention to character sets of incoming data, normalize as early in the
process as possible (especially if ANSEL is involved!), use
UTF-8-compliant tools, and be consistent. Deliver UTF-8 to the browser
(this site helps with the html: This is still
not as easy as it ought to be but at least there are good open-source
tools out there.

Wow, it looks like there are some unicode experts at our midst.  I am in
the middle of developing an international bibliographic database where
most of the titles are in languages other than EN-US.

Our database will store citations entered in via a web form since the
bibliography is in card format.  I am using MySQL 4 because of the
unicode support and collations.  I normally use postgres, but I figured
for a database that will mainly be used for searching only (very little
writes after the data has been populated) i'd give MySQL a try.

One feature we would like to offer is searching via the collations.  For
example, if I enter the phrase francais, i would hope that any items
with the term français would result.  Is it correct to use MySQL's
collations for this?  Does anyone have experience with this?

I am still learning the uses of UTF-8 characters, so I am glad there are
so many of you who know so much about this on this list!