Re: [CODE4LIB] Looking for Bib and LC subject authority data
Hi Charles-Antoine, I can't be terribly specific about timing, but ranging anywhere between eventually and in the near future, we here at LC will have an alpha offering of LCSH data as SKOS RDF that should be available to the public. The SKOS data should be able to help with your thesaurus hierarchy needs. Several offices in LC are trying to work out the delivery mechanism details right now. Ed Summers, who is doing the lion's share of the development work, can speak about how he crafted the expression of LCSH-SKOS. Clay On Feb 7, 2008, at 5:49 PM, Charles Antoine Julien, Mr wrote: A kind fellow on NGC4Lib suggested I mention this here. I'm developing a 3D fly-through interface for an LCSH organized collection but I'm having difficulty finding a library willing to give me a subset of their data (i.e., subject headings (broad to narrow terms) and the bib records to which they have been assigned). They just don't see why they should help me. Their value added isn't clear to them since this is experimental and I have no wish turn this into a business (I like to build and test solutions...selling them isn't my piece of pie). I'm planning to import the data into Access or SQL Server (depending how much I get) and partly normalize the bib records so subject terms for each item are in a separate one-to-many table. I also need the authority data to establish where each subject term (and its associated bib records) resides in the broad to narrow term hierarchy...this is more useful in the sciences which seems to have 4-6 levels deep. Jonathan Rochkind (kind fellow in question) suggested the following -I could access data directly through Z39.5... -I could take LC subject authority data in MARC format from a certain grey-area-legal source -I could take bib records (and their associated LCSH terms) from http://simile.mit.edu/wiki/Dataset_Collection Particularly: http://simile.mit.edu/rdf-test-data/barton/compressed/ In particular, the Barton collection. That will be in the MODS format, which will actually be easier to work with than library standard MARC. Or http://www.archive.org/details/marc_records_scriblio_net Obviously I'm not looking forward to parsing MARC data although I've heard there are scripts for this. Additional suggestions and/or comments would be greatly appreciated. Thanks a bunch, Charles-Antoine Julien Ph.D Candidate School of Information Studies McGill University
Re: [CODE4LIB] xml java package
I don't know if it's still the case, but I know a recent EAD project that tried to use Castor said that it had problems with mixed content models. -- Clay On Feb 1, 2008, at 10:50 AM, Riley, Jenn wrote: -Original Message- I now need to read XML. Unlike indexing and doing OAI-PMH, there are a myriad of tools for reading and writing XML. I've done SAX before. I think I've done a bit of DOM. If I wanted a straight-forward and well-supported Java package that supported these APIs, then what package might I use? If the data you're manipulating is partially or fully described by a Schema or DTD, consider using a package such as Castor (castor.org) I think I recall hearing in the past that Castor had trouble with XML files that used mixed content models (a set into which TEI and EAD both fall) - can anyone confirm if that's currently the case (or that it never was and I'm completely misremembering)? Jenn Jenn Riley Metadata Librarian Digital Library Program Indiana University - Bloomington Wells Library W501 (812) 856-5759 www.dlib.indiana.edu Inquiring Librarian blog: www.inquiringlibrarian.blogspot.com
[CODE4LIB] Georeferencing with open source tools
Hi all, This is for any of you GIS specialists out there! Please help me if possible, and feel free to email backchannel since this is somewhat off-topic for both listservs. I'm a bit of a newbie to doing GIS and geomatics work. As part of an effort to enable geographical browse, search, and retrieval for some digital projects at LoC, I'm trying to georeference a historical map [1] of the continental United States, as well as parts of Canada and Mexico. I plan to serve the map as a WMS layer, then pull it in using the OpenLayers JS library inside our site for the end user navigation. I aim to georectify the image so that it can be placed over top of a more modern map layer along the lines of Yahoo! Map Mixer [2] or MetaCarta's Rectifier [3]. At this point I'll merely work with raster data, since vector data isn't needed yet. Here are the requirements this project: -- I am required to use this historical map as the basis of my work, as opposed to something more modern like Google Maps on its own. -- I have to use open source tools, as I don't have ESRI tools at my disposal. Thus far my toolkit includes: Quantum GIS and uDig, GDAL, PROJ.4, PostGIS, GRASS (although I'm not skilled at GRASS), and Geoserver with PostGIS for serving the content. Here's what I (think I) know, based on meeting with a GIS specialist here at LC who has since departed for a new job: -- The map has an Albers Equal Area projection, with NAD83 datum, and GRS80 spheroid. (In proj or cs2cs: +proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs). She determined this by opening the map in ArcGIS, and it's ESRI Well Known Text (WKT) 102003. -- The physical map was once folded, and might be throwing off some of the measurements. Using both gdal_translate (on the shell) and Quantum GIS (GUI) georeference plugin, I've plotted upward of 40 ground control points throughout the USA, Canada, Mexico, and Cuba. I've found the WGS84 latlong coordinates on databases such as the GNIS as well as the USGS. US points are plotted to 5 decimal points of accuracy. International points have 2 decimal points. I run these through cs2cs to convert latlong WGS84 to the Albers/NAD83 coordinates. After warping the image based on these Albers/NAD83 ground control points, I'll still end up as much as 50 miles off on my north/south results, and around 3-4 miles east/west. My question primarily has to do with best practices for georeferencing a map that covers such a large area like North America.Have I used enough points? Should I use more? Or have I already committed overkill? Or, do I even have the right projection information? Would I be better off using some other tool(s)? Thanks for any help you can provide. I can provide more info, the raster image, etc., if needed. Clay [1] http://hdl.loc.gov/loc.gmd/g3301p.ct001842 [2] http://maps.yahoo.com/mapmixer [3] http://labs.metacarta.com/rectifier/
Re: [CODE4LIB] Georeferencing with open source tools
On Dec 5, 2007, at 3:18 PM, Michael J. Giarlo wrote: I think I understood about four words of this. That sounds about right. (/me ducks, jokingly) Good luck, and rock on! -Mike I'm not even sure if *I* understand it. I'm beginning to take a Homer Simpson-esque (as Sanitation Commissioner [1] ) approach to it: Can't someone else do it? [1] http://en.wikipedia.org/wiki/Trash_of_the_Titans Clay
Re: [CODE4LIB] RFC 5005 ATOM extension and OAI
Hi Peter, I completely agree with everything you just wrote, especially about Atom + APP being more than just a technology for blogs. APP is a great lightweight alternative to WebDAV, and promising for all sorts of data transfer. The fact that it has developer groundswell is a huge plus. During my Princeton days Kevin Clarke and I briefly talked about what a METS + APP metadata editing application could do. (I can't remember the answer, but I bet it would be snazzy.) To stay on the OAI theme, I sometimes wish the activity of sharing metadata used a push technology like APP instead of the OAI pull/ harvest approach that we use today. One of the reasons is that I feel it would be easier for the content providers to achieve deletes via HTTP DELETE for deleted record behavior, simply because the content providers would know to whom they PUT or POSTed their metadata. Service providers wouldn't have to support deleted records, they'd just have to reindex. I came to this realization out of frustration that most OAI toolkits (at the time, ca. 2005) didn't support that functionality well -- or at all. I don't know if that's still the case. However, the need to delete records is a reality for most projects, and OAI has somewhat awkwardly made us rethink how to delete a record in repositories and the like, both on the service and data provider end. You almost have to build your entire system around handling deleted records just for OAI exposure. In reality it seems like you just end up masquerading or re-representing its outward visibility on our local systems, which gets onerous. I guess the difference is that the growing number of Atom developers are heeding the requirement for deletions, whereas the few existing OAI toolkit developers have deemed that functionality as optional. Long winded as usual, Clay On Oct 24, 2007, at 12:51 AM, pkeane wrote: This conversation about Atom is, I think, really an important one to have. As well designed and thought out as protocols standards such as OAI-PMH, METS (and the budding OAI-ORE spec) are, they don't have that viral technology attribute of utter simplicity. [snipped] I see numerous advantages to standardizing on Atom for any number of outward-facing services/end-points. I think it would be sad if Atom and AtomPub were seen only as technologies used by and for blogs/blogging.
Re: [CODE4LIB] code4lib lucene pre-conference
I'm sure most of you have seen this, but there is a lot of good work going on regarding XQuery full text searching by the W3C. LC is pushing a lot of the activity in this group, and using hefty document-centric EAD examples in the testing. http://www.w3.org/TR/xquery-full-text/ FWIW, traditionally I've been a fan of utilizing an indexing tool that is independent from my storage. But the indexing (a subset of Lucene) that is embedded in the NXDB (X-Hive) and expressed in XQuery in use at Princeton is good. It changed my opinions a bit about having the layers separated, and I now think that XQuery Full Text has a chance. We only had to switch to the full, independent Lucene to implement some features such as weighting, etc., that the NXDB didn't include off the shelf. Regardless, though, having a standards-based syntax for querying is a good thing. Or, to put it another way, at least it doesn't hurt. Those that don't wish to interact with an index due to standards overhead don't necessarily have to do so. But for some, it will fit the bill by allowing to put in new backends and simply plugging into the standard syntax. Clay Kevin S. Clarke wrote: On 11/27/06, Ross Singer [EMAIL PROTECTED] wrote: On 11/27/06, Kevin S. Clarke [EMAIL PROTECTED] wrote: Seriously, please don't get hung up on the 'proprietary'-ness of Lucene's query syntax. It's open, it's widely used, and has been ported to a handful of languages. I mean, why would you trade off something that works well /now/ and will most likely only get better for something that you admit sort of sucks? It's not that fulltext for XQuery sucks... it just doesn't exist (right now people do it through extensions to the language). I would expect that the spec that gets written will not be that far from Lucene's syntax. You are talking about the syntax that goes into the search box right? I don't expect an XQuery fulltext spec will change that -- it is just how you pass that along to Lucene that will be different (e.g., do you do it in Java, in Ruby, in XML via Solr, do you do it in XQuery, etc.) And I agree with Erik's assessment that it's better to keep your repository and index separated for exactly the sort of scenario you worry about. If a super-duper new indexer comes along, you can always just switch to it, then. How do you switch to it? How do the pieces talk? This is the point of standards. If there is a standard way of addressing an index then you don't have to care what the newest greatest indexer is. This paragraph seems in contrast to your one above. Kevin
[CODE4LIB] Princeton University Library job posting
Hi all, I invite you to have a look at a new job opening at Princeton University Library. The Digital Library Specialist position is part of the Digital Library Operations Group, and the nature of the work will heavily focus on native XML datbase development: XQuery, XForms, etc. Development will largely pertain to working with METS ingest and dissemination (wrapping MODS/MADS, EAD/EAC, TEI, VRA Core, DDI, and the like) in a native XML environment, as shown in the library's Digital Collections site [1]. Workflow management and business process development will also be a large focus of this position. The full job description can be found on the Princeton Library HR website: http://library.princeton.edu/hr/positions/JobDigitalLibrSpec12mth.html Thanks, Clay Redding [1] http://diglib.princeton.edu