Re: [CODE4LIB] Looking for Bib and LC subject authority data

2008-02-08 Thread Clay Redding

Hi Charles-Antoine,

I can't be terribly specific about timing, but ranging anywhere
between eventually and in the near future, we here at LC will
have an alpha offering of LCSH data as SKOS RDF that should be
available to the public.  The SKOS data should be able to help with
your thesaurus hierarchy needs.  Several offices in LC are trying to
work out the delivery mechanism details right now.

Ed Summers, who is doing the lion's share of the development work,
can speak about how he crafted the expression of LCSH-SKOS.


On Feb 7, 2008, at 5:49 PM, Charles Antoine Julien, Mr wrote:

A kind fellow on NGC4Lib suggested I mention this here.

I'm developing a 3D fly-through interface for an LCSH organized
collection but I'm having difficulty finding a library willing to
me a subset of their data (i.e., subject headings (broad to narrow
terms) and the bib records to which they have been assigned).  They
don't see why they should help me.  Their value added isn't clear to
them since this is experimental and I have no wish turn this into a
business (I like to build and test solutions...selling them isn't my
piece of pie).

I'm planning to import the data into Access or SQL Server
(depending how
much I get) and partly normalize the bib records so subject terms for
each item are in a separate one-to-many table.  I also need the
authority data to establish where each subject term (and its
bib records) resides in the broad to narrow term hierarchy...this is
more useful in the sciences which seems to have 4-6 levels deep.

Jonathan Rochkind (kind fellow in question) suggested the following

-I could access data directly through Z39.5...

-I could take LC subject authority data in MARC format from a
grey-area-legal source

-I could take bib records (and their associated LCSH terms) from Particularly:

In particular, the Barton collection. That will be in the MODS
which will actually be easier to work with than library standard MARC.


Obviously I'm not looking forward to parsing MARC data although I've
heard there are scripts for this.

Additional suggestions and/or comments would be greatly appreciated.

Thanks a bunch,

Charles-Antoine Julien

Ph.D Candidate

School of Information Studies

McGill University

Re: [CODE4LIB] xml java package

2008-02-01 Thread Clay Redding

I don't know if it's still the case, but I know a recent EAD project
that tried to use Castor said that it had problems with mixed content
models.  -- Clay

On Feb 1, 2008, at 10:50 AM, Riley, Jenn wrote:

-Original Message-

I now need to read XML. Unlike indexing and doing OAI-PMH, there are
a myriad of tools for reading and writing XML. I've done SAX before.
I think I've done a bit of DOM. If I wanted a straight-forward and
well-supported Java package that supported these APIs, then what
package might I use?

If the data you're manipulating is partially or fully described by a
Schema or DTD, consider using a package such as Castor (

I think I recall hearing in the past that Castor had trouble with
XML files that used mixed content models (a set into which TEI and
EAD both fall) - can anyone confirm if that's currently the case
(or that it never was and I'm completely misremembering)?


Jenn Riley
Metadata Librarian
Digital Library Program
Indiana University - Bloomington
Wells Library W501
(812) 856-5759

Inquiring Librarian blog:

[CODE4LIB] Georeferencing with open source tools

2007-12-05 Thread Clay Redding

Hi all,

This is for any of you GIS specialists out there!  Please help me if
possible, and feel free to email backchannel since this is somewhat
off-topic for both listservs.  I'm a bit of a newbie to doing GIS and
geomatics work.  As part of an effort to enable geographical browse,
search, and retrieval for some digital projects at LoC, I'm trying to
georeference a historical map [1] of the continental United States,
as well as parts of Canada and Mexico.

I plan to serve the map as a WMS layer, then pull it in using the
OpenLayers JS library inside our site for the end user navigation.  I
aim to georectify the image so that it can be placed over top of a
more modern map layer along the lines of Yahoo! Map Mixer [2] or
MetaCarta's Rectifier [3].  At this point I'll merely work with
raster data, since vector data isn't needed yet.

Here are the requirements this project:

-- I am required to use this historical map as the basis of my work,
as opposed to something more modern like Google Maps on its own.

-- I have to use open source tools, as I don't have ESRI tools at my
disposal.  Thus far my toolkit includes: Quantum GIS and uDig, GDAL,
PROJ.4, PostGIS, GRASS (although I'm not skilled at GRASS), and
Geoserver with PostGIS for serving the content.

Here's what I (think I) know, based on meeting with a GIS specialist
here at LC who has since departed for a new job:

-- The map has an Albers Equal Area projection, with NAD83 datum, and
GRS80 spheroid. (In proj or cs2cs: +proj=aea +lat_1=20 +lat_2=60
+lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m
+no_defs).  She determined this by opening the map in ArcGIS, and
it's ESRI Well Known Text (WKT) 102003.

-- The physical map was once folded, and might be throwing off some
of the measurements.

Using both gdal_translate (on the shell) and Quantum GIS (GUI)
georeference plugin, I've plotted upward of 40 ground control points
throughout the USA, Canada, Mexico, and Cuba.  I've found the WGS84
latlong coordinates on databases such as the GNIS as well as the
USGS.  US points are plotted to 5 decimal points of accuracy.
International points have 2 decimal points.  I run these through
cs2cs to convert latlong WGS84 to the Albers/NAD83 coordinates.
After warping the image based on these Albers/NAD83 ground control
points, I'll still end up as much as 50 miles off on my north/south
results, and around 3-4 miles east/west.

My question primarily has to do with best practices for
georeferencing a map that covers such a large area like North
America.Have I used enough points?  Should I use more?  Or have I
already committed overkill?  Or, do I even have the right projection
information?  Would I be better off using some other tool(s)?

Thanks for any help you can provide.  I can provide more info, the
raster image, etc., if needed.



Re: [CODE4LIB] Georeferencing with open source tools

2007-12-05 Thread Clay Redding

On Dec 5, 2007, at 3:18 PM, Michael J. Giarlo wrote:

I think I understood about four words of this.

That sounds about right. (/me ducks, jokingly)

Good luck, and rock on!


I'm not even sure if *I* understand it.  I'm beginning to take a
Homer Simpson-esque (as Sanitation Commissioner [1] ) approach to it:

Can't someone else do it?



Re: [CODE4LIB] RFC 5005 ATOM extension and OAI

2007-10-24 Thread Clay Redding

Hi Peter,

I completely agree with everything you just wrote, especially about
Atom + APP being more than just a technology for blogs.  APP is a
great lightweight alternative to WebDAV, and promising for all sorts
of data transfer.  The fact that it has developer groundswell is a
huge plus.  During my Princeton days Kevin Clarke and I briefly
talked about what a METS + APP metadata editing application could
do.  (I can't remember the answer, but I bet it would be snazzy.)

To stay on the OAI theme, I sometimes wish the activity of sharing
metadata used a push technology like APP instead of the OAI pull/
harvest approach that we use today.   One of the reasons is that I
feel it would be easier for the content providers to achieve deletes
via HTTP DELETE for deleted record behavior, simply because the
content providers would know to whom they PUT or POSTed their
metadata.  Service providers wouldn't have to support deleted
records, they'd just have to reindex.

I came to this realization out of frustration that most OAI toolkits
(at the time, ca. 2005) didn't support that functionality well -- or
at all.  I don't know if that's still the case.  However, the need to
delete records is a reality for most projects, and OAI has somewhat
awkwardly made us rethink how to delete a record in repositories
and the like, both on the service and data provider end.   You almost
have to build your entire system around handling deleted records
just for OAI exposure.   In reality it seems like you just end up
masquerading or re-representing its outward visibility on our local
systems, which gets onerous.

I guess the difference is that the growing number of Atom developers
are heeding the requirement for deletions, whereas the few existing
OAI toolkit developers have deemed that functionality as optional.

Long winded as usual,

On Oct 24, 2007, at 12:51 AM, pkeane wrote:

This conversation about Atom is, I think, really an important one
to have.
As well designed and thought out as protocols  standards such as
METS (and the budding OAI-ORE spec) are, they don't have that viral
technology attribute of utter simplicity.  [snipped]

I see numerous advantages to
standardizing on Atom for any number of outward-facing
services/end-points. I think it would be sad if Atom and AtomPub
were seen
only as technologies used by and for blogs/blogging.

Re: [CODE4LIB] code4lib lucene pre-conference

2006-11-28 Thread Clay Redding

I'm sure most of you have seen this, but there is a lot of good work
going on regarding XQuery full text searching by the W3C.  LC is pushing
a lot of the activity in this group, and using hefty document-centric
EAD examples in the testing.

FWIW, traditionally I've been a fan of utilizing an indexing tool that
is independent from my storage.  But the indexing (a subset of Lucene)
that is embedded in the NXDB (X-Hive) and expressed in XQuery in use at
Princeton is good.  It changed my opinions a bit about having the layers
separated, and I now think that XQuery Full Text has a chance.  We only
had to switch to the full, independent Lucene to implement some features
such as weighting, etc., that the NXDB didn't include off the shelf.

Regardless, though, having a standards-based syntax for querying is a
good thing.  Or, to put it another way, at least it doesn't hurt.  Those
that don't wish to interact with an index due to standards overhead
don't necessarily have to do so.  But for some, it will fit the bill by
allowing to put in new backends and simply plugging into the standard


Kevin S. Clarke wrote:

On 11/27/06, Ross Singer [EMAIL PROTECTED] wrote:

On 11/27/06, Kevin S. Clarke [EMAIL PROTECTED] wrote:

Seriously, please don't get hung up on the 'proprietary'-ness of
Lucene's query syntax.  It's open, it's widely used, and has been
ported to a handful of languages.  I mean, why would you trade off
something that works well /now/ and will most likely only get better
for something that you admit sort of sucks?

It's not that fulltext for XQuery sucks... it just doesn't exist
(right now people do it through extensions to the language).  I would
expect that the spec that gets written will not be that far from
Lucene's syntax.  You are talking about the syntax that goes into the
search box right?  I don't expect an XQuery fulltext spec will change
that -- it is just how you pass that along to Lucene that will be
different (e.g., do you do it in Java, in Ruby, in XML via Solr, do
you do it in XQuery, etc.)

And I agree with Erik's assessment that it's better to keep your
repository and index separated for exactly the sort of scenario you
worry about.  If a super-duper new indexer comes along, you can always
just switch to it, then.

How do you switch to it?  How do the pieces talk?  This is the point
of standards.  If there is a standard way of addressing an index then
you don't have to care what the newest greatest indexer is.  This
paragraph seems in contrast to your one above.


[CODE4LIB] Princeton University Library job posting

2006-03-03 Thread Clay Redding

Hi all,

I invite you to have a look at a new job opening at Princeton University
Library.  The Digital Library Specialist position is part of the Digital
Library Operations Group, and the nature of the work will heavily focus
on native XML datbase development:  XQuery, XForms, etc.  Development
will largely pertain to working with METS ingest and dissemination
(wrapping MODS/MADS, EAD/EAC, TEI, VRA Core, DDI, and the like) in a
native XML environment, as shown in the library's Digital Collections
site [1].  Workflow management and business process development will
also be a large focus of this position.

The full job description can be found on the Princeton Library HR website:

Clay Redding