Re: [CODE4LIB] MARCXML - What is it for?
Sorry. That was rude, and uncalled for. I disagree that the problem is easily solved, even without the politics. There've been lots of attempts to try to come up with a sufficiently expressive toolset for dealing with biblio data, and we're still working on it. If you do think you've got some insight, I'm sure we're all ears, but try to frame it terms of the existing work if you can (RDA, some of the dublin core stuff, etc.) so we have a frame of reference. On Mon, Oct 25, 2010 at 10:18 PM, Bill Dueber wrote: > On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen < > alexander.johanne...@gmail.com> wrote: > >> Political? For sure. Engineering? Not so much. > > > Ok. Solve it. Let us know when you're done. > > > > -- > Bill Dueber > Library Systems Programmer > University of Michigan Library > -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MARCXML - What is it for?
On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen < alexander.johanne...@gmail.com> wrote: > Political? For sure. Engineering? Not so much. Ok. Solve it. Let us know when you're done. -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MARCXML - What is it for?
i'm not a coder but i undertook a study of XML some years after it came onto the scene and with a likely confused notion that it would be the next significant technology, I learned some XSL and later was able to weave PubMed Central journal information (CSV transformed into XML) together with Dublin Core metadata of journal articles into MARCXML during harvest with MarcEdit (which the inestimable Terry Reece continues to tweak). Also used the same XML journal data to augment NLM journal records with PubMed Central holdings and other data with a transform in my IDE though it took me weeks to get right..so, no asperations to become a coder. Probably did not get all of the MARC cataloging rules right and I can empathize with those who come to MARC and cataloging standards without cataloging training, experience. My library experience was primarily as library director...my expertise on library specializations would always be under question. regards, dana -- Dana Pearson dbpearsonmlis.com
Re: [CODE4LIB] MARCXML - What is it for?
On Tue, Oct 26, 2010 at 12:48 PM, Bill Dueber wrote: > Here, I think you're guilty of radically underestimating "lots of people > around the library world." No one thinks MARC is a good solution to > our modern problems, and no one who actually knows what MARC > is has trouble understanding MARC-XML as an XML serialization of > the same old data -- certainly not anyone capable of meaningful > contribution to work on an alternative. Slow down, Tex. "Lots of people in the library world" is not the same as developers, or even good developers, or even good XML developers, or even good XML developers who knows what the document model imposes to a data-centric approach. > The problem we're dealing with is *hard*. Mind-numbingly hard. This is no justification for not doing things better. (And I'd love to know what the hard bits are; always interesting to hear from various people as to what they think are the *real* problems of library problems, as opposed to any other problem they have) > The library world has several generations of infrastructure built > around MARC (by which I mean AACR2), and devising data > structures and standards that are a big enough improvement over > MARC to warrant replacing all that infrastructure is an engineering > and political nightmare. Political? For sure. Engineering? Not so much. This is just that whole "blinded by MARC" issue that keeps cropping up from time to time, and rightly so; it is truly a beast - at least the way we have come to know it through AACR2 and all its friends and its death-defying focus on all things bibliographic - that has paralyzed library innovation, probably to the point of making libraries almost irrelevant to the world. > I'm happy to take potshots at the RDA stuff from the sidelines, but I never > forget that I'm on the sidelines, and that the people active in the game are > among the best and brightest we have to offer, working on a problem that > invariably seems more intractable the deeper in you go. Well, that's a pretty scary sentence, for all sorts of reasons, but I think I shall not go there. > If you think MARC-XML is some sort of an actual problem What, because you don't agree with me the problem doesn't exist? :) > and that people > just need to be shouted at to realize that and do something about it, then, > well, I think you're just plain wrong. Fair enough, although you seem to be under the assumption that all of the stuff I'm saying is a figment of my imagination (I've been involved in several projects lambasted because managers think MARCXML is solving some imaginary problem; this is not bullshit, but pain and suffering from the battlefields of library development), that I'm not one of those developers (or one of you, although judging from this discussion it's clear that I am not), that the things I say somehow doesn't apply because you don't agree with, umm, what I'm assuming is my somewhat direct approach to stating my heretic opinions. Alex -- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps --- http://shelter.nu/blog/ -- -- http://www.google.com/profiles/alexander.johannesen ---
Re: [CODE4LIB] MARCXML - What is it for?
On Mon, Oct 25, 2010 at 9:32 PM, Alexander Johannesen < alexander.johanne...@gmail.com> wrote: > Lots of people around the library world infra-structure will think > that since your data is now in XML it has taken some important step > towards being inter-operable with the rest of the world, that library > data now is part of the real world in *any* meaningful way, but this > is simply demonstrably deceivingly not true. Here, I think you're guilty of radically underestimating "lots of people around the library world." No one thinks MARC is a good solution to our modern problems, and no one who actually knows what MARC is has trouble understanding MARC-XML as an XML serialization of the same old data -- certainly not anyone capable of meaningful contribution to work on an alternative. You seem to presuppose that there's an enormous pent-up energy poised to sweep in changes to an obviously-better data format, and that the existence of MARC-XML somehow defuses all that energy. The truth is that a high percentage of people that work with MARC data actively think about (or curse) things that are wrong with it and gobs and gobs of ridiculously-smart people work on a variety of alternate solutions (not the least of which is RDA) and get their organizations to spend significant money to do so. The problem we're dealing with is *hard*. Mind-numbingly hard. The library world has several generations of infrastructure built around MARC (by which I mean AACR2), and devising data structures and standards that are a big enough improvement over MARC to warrant replacing all that infrastructure is an engineering and political nightmare. I'm happy to take potshots at the RDA stuff from the sidelines, but I never forget that I'm on the sidelines, and that the people active in the game are among the best and brightest we have to offer, working on a problem that invariably seems more intractable the deeper in you go. If you think MARC-XML is some sort of an actual problem, and that people just need to be shouted at to realize that and do something about it, then, well, I think you're just plain wrong. -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MARCXML - What is it for?
On Tue, Oct 26, 2010 at 11:56 AM, Walker, David wrote: > Your criticisms of MARC-XML all seem to presume that MARC-XML is the > goal, the end point in the process. But MARC-XML is really better seen as a > utility, a middle step between binary MARC and the real goal, which is some > other "useful and interesting" XML schema. How do you create an ontological commitment in a community to an expanding and useful set of tools and vocabularies? I think I need to remind people of what MARCXML is supposed to be ; "a framework for working with MARC data in a XML environment. This framework is intended to be flexible and extensible to allow users to work with MARC data in ways specific to their needs. The framework itself includes many components such as schemas, stylesheets, and software tools." I'm not assuming MARCXML is a goal, no matter how we define that. I'm poo-pooing MARCXML for the semantics we, as a community, have been given by a process I suspect had goals very different from reality. Very few people would "work with MARC through MARCXML", they would use it to convert it, filter it, hack around it to something else entirely. And I'm afraid lots of people are missing the point of stubbing the developments in a community by embracing tools that pushes a packet that inhibits innovation. So, here's the point, in paraphrased point; "Here's our new thing. And we did it by simply converting all our MARC into MARCXML that runs on a cron job every midnight, and a bit of horrendous XSLT that's impossible to maintain." "But it looks just like the old thing using MARC and some templates?" "Ah yes, but now we're doing it in XML!" (Yeah, yeah, your mileage will vary) I'm sorry if I'm overly pessimistic about the XML goodness in the world, not for the XML itself, but the consequences of the named entities involved. I've been a die-hard XML wonk for far too many years, and the tools in that tool-chest doesn't automatically solve hard problems better by wrapping stuff up in angle brackets, and - dare I say it? - perhaps introduces a whole fleet of other problems rarely talked about when XML is the latest buzz-word, like using a document model on what's a traditional records model, character encodings, whitespace issues, unicode, size and efficiencies (the other part of this thread), and so on. But let me also be a bit more specific about that hard semantic problem I'm talking about; Lots of people around the library world infra-structure will think that since your data is now in XML it has taken some important step towards being inter-operable with the rest of the world, that library data now is part of the real world in *any* meaningful way, but this is simply demonstrably deceivingly not true. By having our data in XML has killed a few good projects where people have gone "A new project to convert our MARC into useful XML? Aha! LoC has already solved that problem for us." Btw, to those who find me so obnoxious, at no point do I say it was intentionally evil, just evil none the same. The road to hell is, as always, paved with good intentions. Alex -- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps --- http://shelter.nu/blog/ -- -- http://www.google.com/profiles/alexander.johannesen ---
Re: [CODE4LIB] MARCXML - What is it for?
On Oct 25, 2010, at 8:56 PM, Walker, David wrote: > Your criticisms of MARC-XML all seem to presume that MARC-XML is the goal, > the end point in the process. But MARC-XML is really better seen as a > utility, a middle step between binary MARC and the real goal, which is some > other "useful and interesting" XML schema. Exactly. -- Eric Morgan
Re: [CODE4LIB] MARCXML - What is it for?
> b) expanding it to be actual useful and interesting. But here I think you've missed the very utility of MARC-XML. Let's say you have a binary MARC file (the kind that comes out of an ILS) and want to transform that into MODS, Dublin Core, or maybe some other XML schema. How would you do that? One way is to first transform the MARC into MARC-XML. Then you can use XSLT to crosswalk the MARC-XML into that other schema. Very handy. Your criticisms of MARC-XML all seem to presume that MARC-XML is the goal, the end point in the process. But MARC-XML is really better seen as a utility, a middle step between binary MARC and the real goal, which is some other "useful and interesting" XML schema. --Dave == David Walker Library Web Services Manager California State University http://xerxes.calstate.edu From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander Johannesen [alexander.johanne...@gmail.com] Sent: Monday, October 25, 2010 12:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARCXML - What is it for? Hiya, On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack wrote: > Switching to an XML format doesn't help with that at all. I'm willing to take it further and say that MARCXML was the worst thing the library world ever did. Some might argue it was a good first step, and that it was better with something rather than nothing, to which I respond ; Poppycock! MARCXML is nothing short of evil. Not only does it goes against every principal of good XML anywhere (don't rely on whitespace, structure over code, namespace conventions, identity management, document control, separation of entities and properties, and on and on), it breaks the ontological commitment that a better treatment of the MARC data could bring, deterring people from actually a) using the darn thing as anything but a bare minimal crutch, and b) expanding it to be actual useful and interesting. The quicker the library world can get rid of this monstrosity, the better, although I doubt that will ever happen; it will hang around like a foul stench for as long as there is MARC in the world. A long time. A long sad time. A few extra notes; http://shelterit.blogspot.com/2008/09/marcxml-beast-of-burden.html Can you tell I'm not a fan? :) Kind regards, Alex -- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps --- http://shelter.nu/blog/ -- -- http://www.google.com/profiles/alexander.johannesen ---
Re: [CODE4LIB] Django
On Mon, Oct 25, 2010 at 6:33 PM, Gabriel Farrell wrote: > If you already know PHP you might want to check out Symfony or another > PHP framework to get the hang of web frameworks, then move onto other > languages from there. I've been using Django for a couple of years now, and have been tasked to introduce Django to a team in my current employer. Two of the developers here, both experienced in PHP but just learning Python, told me that they've found Django much simpler and easier to learn than Symfony. Besides the original Django Book, my colleagues have also enjoyed "Python Web Development with Django", which includes half a dozen simple and diverse example applications. http://www.amazon.com/Python-Development-Django-Jeff-Forcier/dp/0132356139 -- Luciano Ramalho programador repentista || stand-up programmer Twitter: @luciano
Re: [CODE4LIB] Django
I know the difference. >>> Andrew Hankinson 10/25/2010 4:40 PM >>> Django is a web framework; Python is the language. If you don't know the difference, I'd suggest sticking with PHP and going with one of the frameworks available to you there. On 2010-10-25, at 4:25 PM, Junior Tidal wrote: > Thanks for the suggestions everyone. I haven't actively looked for resources > since I'm busy doing collection development. However, I came across an > advertisement for a Django book and figured it would be a useful language to > learn. I already know php, so it seems logical that django is the next step? > > Best, > > Junior Tidal > Assistant Professor > Web Services and Multimedia Librarian > New York City College of Technology, CUNY > 300 Jay Street > Brooklyn, NY 11210 > 718.260.5481 > > http://library.citytech.cuny.edu > > Andrew Hankinson 10/25/2010 10:23 AM >>> > There's the Django Book: http://www.djangobook.com/ (Make sure you choose the > revised edition for 1.0) > The Django docs, with some intro tutorials: > http://docs.djangoproject.com/en/1.2/ > > Did you try those already? > > > On 2010-10-25, at 10:19 AM, Junior Tidal wrote: > >> Hello Code4Lib, >> >> Does anyone have any recommendations for learning Django? Books, websites, >> video tutorials, etc. ... >> >> thanks, >> >> Junior Tidal >> Assistant Professor >> Web Services and Multimedia Librarian >> New York City College of Technology, CUNY >> 300 Jay Street >> Brooklyn, NY 11210 >> 718.260.5481 >> >> http://library.citytech.cuny.edu
Re: [CODE4LIB] MARCXML - What is it for?
I know there are two parts of this discussion (speed on the one hand, applicability/features on teh other), but for the former, running a little benchmark just isn't that hard. Aren't we supposed to, you know, prefer to make decisions based on data? Note: I'm only testing deserialization because there's isn't, as of now, a fast serialization option for ruby-marc. It uses REXML, and it's dog-slow. I already looked marc-in-json vs marc binary at http://robotlibrarian.billdueber.com/sizespeed-of-various-marc-serializations-using-ruby-marc/ Benchmark Source: http://gist.github.com/645683 18,883 records as either an XML collection or newline-delimited json. Open the file, read every record, pull out a title. Repeat 5 times for a total of 94,415 records (i.e., just under 100K records total). Under ruby-marc, using the libxml deserializer is the fastest option. If you're using the REXML parser, well, god help us all. ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-darwin9.8.0]. User time reported in seconds. xml w/libxml 227 seconds marc-in-json w/yajl 130 seconds Soquite a bit faster (more than 40%). For a million records (assuming I can just say 10*these_values) you're talking about a difference of 16 minutes due to just reading speed. Assuming, of course, you're running your code on my desktop. Today. For the 8M records I have to deal with, that'd be roughly 8M * ((227-130) / 94,415) = 7806 seconds, or about 130 minutes. S...a lot. Of course, if you're using a slower XML library or a slower JSON library, your numbers will vary quite a bit. REXML is unforgivingly slow, and json/pure (and even 'json') are quite a bit slower than yajl. And don't forget that you need to serialize these things from your source somehow... -Bill- On Mon, Oct 25, 2010 at 4:23 PM, Stephen Meyer wrote: > Kyle Banerjee wrote: > >> On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding >> wrote: >> >> Does processing speed of something matter anymore? You'd have to be >>> doing a LOT of processing to care, wouldn't you? >>> >>> >> Data migrations and data dumps are a common use case. Needing to break or >> make hundreds of thousands or millions of records is not uncommon. >> >> kyle >> > > To make this concrete, we processes the MARC records from 14 separate ILS's > throughout the University of Wisconsin System. We extract, sort on OCLC > number, dedup and merge pieces from any campus that has a record for the > work. The MARC that we then index and display here > > http://forward.library.wisconsin.edu/catalog/ocm37443537?school_code=WU > > is not identical to the version of the MARC record from any of the 4 > schools that hold it. > > We extract 13 million records and dedup down to 8 million every week. Speed > is paramount. > > -sm > -- > Stephen Meyer > Library Application Developer > UW-Madison Libraries > 436 Memorial Library > 728 State St. > Madison, WI 53706 > > sme...@library.wisc.edu > 608-265-2844 (ph) > > > "Just don't let the human factor fail to be a factor at all." > - Andrew Bird, "Tables and Chairs" > -- Bill Dueber Library Systems Programmer University of Michigan Library
Re: [CODE4LIB] MARCXML - What is it for?
Ray Denenberg, Library of Congress wrote: > It really is possible to make your point without being quite so obnoxious. Obnoxious? Alex -- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps --- http://shelter.nu/blog/ -- -- http://www.google.com/profiles/alexander.johannesen ---
Re: [CODE4LIB] Django
Django is a web framework; Python is the language. If you don't know the difference, I'd suggest sticking with PHP and going with one of the frameworks available to you there. On 2010-10-25, at 4:25 PM, Junior Tidal wrote: > Thanks for the suggestions everyone. I haven't actively looked for resources > since I'm busy doing collection development. However, I came across an > advertisement for a Django book and figured it would be a useful language to > learn. I already know php, so it seems logical that django is the next step? > > Best, > > Junior Tidal > Assistant Professor > Web Services and Multimedia Librarian > New York City College of Technology, CUNY > 300 Jay Street > Brooklyn, NY 11210 > 718.260.5481 > > http://library.citytech.cuny.edu > > Andrew Hankinson 10/25/2010 10:23 AM >>> > There's the Django Book: http://www.djangobook.com/ (Make sure you choose the > revised edition for 1.0) > The Django docs, with some intro tutorials: > http://docs.djangoproject.com/en/1.2/ > > Did you try those already? > > > On 2010-10-25, at 10:19 AM, Junior Tidal wrote: > >> Hello Code4Lib, >> >> Does anyone have any recommendations for learning Django? Books, websites, >> video tutorials, etc. ... >> >> thanks, >> >> Junior Tidal >> Assistant Professor >> Web Services and Multimedia Librarian >> New York City College of Technology, CUNY >> 300 Jay Street >> Brooklyn, NY 11210 >> 718.260.5481 >> >> http://library.citytech.cuny.edu
Re: [CODE4LIB] MARCXML - What is it for?
It really is possible to make your point without being quite so obnoxious. Everyone else seems to be able to do so. --Ray -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Alexander Johannesen Sent: Monday, October 25, 2010 3:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARCXML - What is it for? Hiya, On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack wrote: > Switching to an XML format doesn't help with that at all. I'm willing to take it further and say that MARCXML was the worst thing the library world ever did. Some might argue it was a good first step, and that it was better with something rather than nothing, to which I respond ; Poppycock! MARCXML is nothing short of evil. Not only does it goes against every principal of good XML anywhere (don't rely on whitespace, structure over code, namespace conventions, identity management, document control, separation of entities and properties, and on and on), it breaks the ontological commitment that a better treatment of the MARC data could bring, deterring people from actually a) using the darn thing as anything but a bare minimal crutch, and b) expanding it to be actual useful and interesting. The quicker the library world can get rid of this monstrosity, the better, although I doubt that will ever happen; it will hang around like a foul stench for as long as there is MARC in the world. A long time. A long sad time. A few extra notes; http://shelterit.blogspot.com/2008/09/marcxml-beast-of-burden.html Can you tell I'm not a fan? :) Kind regards, Alex -- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps --- http://shelter.nu/blog/ -- -- http://www.google.com/profiles/alexander.johannesen ---
Re: [CODE4LIB] Django
If you already know PHP you might want to check out Symfony or another PHP framework to get the hang of web frameworks, then move onto other languages from there. On Mon, Oct 25, 2010 at 4:25 PM, Junior Tidal wrote: > Thanks for the suggestions everyone. I haven't actively looked for resources > since I'm busy doing collection development. However, I came across an > advertisement for a Django book and figured it would be a useful language to > learn. I already know php, so it seems logical that django is the next step? > > Best, > > Junior Tidal > Assistant Professor > Web Services and Multimedia Librarian > New York City College of Technology, CUNY > 300 Jay Street > Brooklyn, NY 11210 > 718.260.5481 > > http://library.citytech.cuny.edu > > Andrew Hankinson 10/25/2010 10:23 AM >>> > There's the Django Book: http://www.djangobook.com/ (Make sure you choose the > revised edition for 1.0) > The Django docs, with some intro tutorials: > http://docs.djangoproject.com/en/1.2/ > > Did you try those already? > > > On 2010-10-25, at 10:19 AM, Junior Tidal wrote: > >> Hello Code4Lib, >> >> Does anyone have any recommendations for learning Django? Books, websites, >> video tutorials, etc. ... >> >> thanks, >> >> Junior Tidal >> Assistant Professor >> Web Services and Multimedia Librarian >> New York City College of Technology, CUNY >> 300 Jay Street >> Brooklyn, NY 11210 >> 718.260.5481 >> >> http://library.citytech.cuny.edu >
Re: [CODE4LIB] Django
Agreed on the docs at the website. If you can't figure something out from those, dig into the source. Happy hacking! On Mon, Oct 25, 2010 at 10:25 AM, Michael J. Giarlo wrote: > I'd start here: > > http://docs.djangoproject.com/en/1.2/ > > There are some tutorials in there as well. > > -Mike > > > > On Mon, Oct 25, 2010 at 10:19, Junior Tidal wrote: >> Hello Code4Lib, >> >> Does anyone have any recommendations for learning Django? Books, websites, >> video tutorials, etc. ... >> >> thanks, >> >> Junior Tidal >> Assistant Professor >> Web Services and Multimedia Librarian >> New York City College of Technology, CUNY >> 300 Jay Street >> Brooklyn, NY 11210 >> 718.260.5481 >> >> http://library.citytech.cuny.edu >> >
Re: [CODE4LIB] Django
Thanks for the suggestions everyone. I haven't actively looked for resources since I'm busy doing collection development. However, I came across an advertisement for a Django book and figured it would be a useful language to learn. I already know php, so it seems logical that django is the next step? Best, Junior Tidal Assistant Professor Web Services and Multimedia Librarian New York City College of Technology, CUNY 300 Jay Street Brooklyn, NY 11210 718.260.5481 http://library.citytech.cuny.edu >>> Andrew Hankinson 10/25/2010 10:23 AM >>> There's the Django Book: http://www.djangobook.com/ (Make sure you choose the revised edition for 1.0) The Django docs, with some intro tutorials: http://docs.djangoproject.com/en/1.2/ Did you try those already? On 2010-10-25, at 10:19 AM, Junior Tidal wrote: > Hello Code4Lib, > > Does anyone have any recommendations for learning Django? Books, websites, > video tutorials, etc. ... > > thanks, > > Junior Tidal > Assistant Professor > Web Services and Multimedia Librarian > New York City College of Technology, CUNY > 300 Jay Street > Brooklyn, NY 11210 > 718.260.5481 > > http://library.citytech.cuny.edu
Re: [CODE4LIB] MARCXML - What is it for?
Kyle Banerjee wrote: On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding wrote: Does processing speed of something matter anymore? You'd have to be doing a LOT of processing to care, wouldn't you? Data migrations and data dumps are a common use case. Needing to break or make hundreds of thousands or millions of records is not uncommon. kyle To make this concrete, we processes the MARC records from 14 separate ILS's throughout the University of Wisconsin System. We extract, sort on OCLC number, dedup and merge pieces from any campus that has a record for the work. The MARC that we then index and display here http://forward.library.wisconsin.edu/catalog/ocm37443537?school_code=WU is not identical to the version of the MARC record from any of the 4 schools that hold it. We extract 13 million records and dedup down to 8 million every week. Speed is paramount. -sm -- Stephen Meyer Library Application Developer UW-Madison Libraries 436 Memorial Library 728 State St. Madison, WI 53706 sme...@library.wisc.edu 608-265-2844 (ph) "Just don't let the human factor fail to be a factor at all." - Andrew Bird, "Tables and Chairs"
Re: [CODE4LIB] MARCXML - What is it for?
JSON++ I routinely re-index about 2.5M JSON records (originally from binary MARC), and it's several orders of magnitude faster than XML (measured in single-digit minutes rather than double-digit hours). I'm not sure if it's in the same range as binary MARC, but as Tim says, it's plenty fast enough for pragmatic purposes. Unfortunately JSON doesn't have as many mature tools for manipulation as XML (yet?), but I'd be inclined to call it the best of both worlds rather than a middle-ground or compromise. MJ > Marc in JSON can be a nice middle-ground, faster/smaller than MarcXML > (although still probably not as binary), based on a standard low-level data > format so easier to work with using existing tools (and developers eyes) than > binary, no maximum record length. > There have been a couple competing attempts to define a > marc-expressed-in-json 'standard', none have really caught on yet. I like > Ross's latest attempt: > http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/ > > Patrick Hochstenbach wrote: >> Dear Nate, >> >> There is a trade-off: do you want very fast processing of data -> go for >> binary data. do you want to share your data globally easily in many (not per >> se library related) environments -> go for XML/RDF. Open your data and do >> both :-) >> >> Pat >> >> Sent from my iPhone >> >> On 25 Oct 2010, at 20:39, "Nate Vack" wrote: >> >> >>> Hi all, >>> >>> I've just spent the last couple of weeks delving into and decoding a >>> binary file format. This, in turn, got me thinking about MARCXML. >>> >>> In a nutshell, it looks like it's supposed to contain the exact same >>> data as a normal MARC record, except in XML form. As in, it should be >>> round-trippable. >>> >>> What's the advantage to this? I can see using a human-readable format >>> for poorly-documented file formats -- they're relatively easy to read >>> and understand. But MARC is well, well-documented, with more than one >>> free implementation in cursory searching. And once you know a binary >>> file's format, it's no harder to parse than XML, and the data's >>> smaller and processing faster. >>> >>> So... why the XML? >>> >>> Curious, >>> -Nate >>> >> >>
Re: [CODE4LIB] MARCXML - What is it for?
Tim Spalding wrote: Does processing speed of something matter anymore? You'd have to be doing a LOT of processing to care, wouldn't you? Yes,which sometimes you are. Say, when you're indexing 2 or 3 or 10 million marc records into, say, solr. Which is faster depends on what language and what libraries you are using for both binary marc and marcxml. But in many of our experiences, parseing and serializing binary marc _is_ significantly faster than parseing and serializing marcxml. That is of course just one of the various criteria that comes into play when choosing a format. Here's Bill Dueber's benchmarks comparing MarcXML, marc binary, and a marc-in-json format; in ruby, using various library alternatives. I rather like the marc-in-json format for being a happy medium. Whether it's "standard" or not doesn't neccesarily matter when you're dealing with your own records, passing them through several stops on a toolchain, and have tools available that can do it. Who cares if any/everyone else uses it. http://robotlibrarian.billdueber.com/sizespeed-of-various-marc-serializations-using-ruby-marc/
Re: [CODE4LIB] MARCXML - What is it for?
Marc in JSON can be a nice middle-ground, faster/smaller than MarcXML (although still probably not as binary), based on a standard low-level data format so easier to work with using existing tools (and developers eyes) than binary, no maximum record length. There have been a couple competing attempts to define a marc-expressed-in-json 'standard', none have really caught on yet. I like Ross's latest attempt: http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/ Patrick Hochstenbach wrote: Dear Nate, There is a trade-off: do you want very fast processing of data -> go for binary data. do you want to share your data globally easily in many (not per se library related) environments -> go for XML/RDF. Open your data and do both :-) Pat Sent from my iPhone On 25 Oct 2010, at 20:39, "Nate Vack" wrote: Hi all, I've just spent the last couple of weeks delving into and decoding a binary file format. This, in turn, got me thinking about MARCXML. In a nutshell, it looks like it's supposed to contain the exact same data as a normal MARC record, except in XML form. As in, it should be round-trippable. What's the advantage to this? I can see using a human-readable format for poorly-documented file formats -- they're relatively easy to read and understand. But MARC is well, well-documented, with more than one free implementation in cursory searching. And once you know a binary file's format, it's no harder to parse than XML, and the data's smaller and processing faster. So... why the XML? Curious, -Nate
Re: [CODE4LIB] MARCXML - What is it for?
MODS was an attempt to mostly-but-not-entirely-roundtrippably represent data in MARC in a format that's more 'normal' XML, without packed bytes in elements, with element names that are more or less self-documenting, etc. It's caught on even less than MARCXML though, so if you find MARCXML under-adopted (I disagree), you won't like MODS. Personally I think MODS is kind of the worst of both worlds. The only reason to stick with something that looks anything like MARC is to be round-trippable with legacy MARC, which MODS is not. But if you're going to give that up, you really want more improvements than MODS supplies, it's still got a lot of the unfortunate legacy of MARC in it. Nate Vack wrote: On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding wrote: - XML is self-describing, binary is not. Not to quibble, but that's only in a theoretical sense here. Something like Amazon XML is truly self-describing. MARCXML is self-obfuscating. At least MARC records kinda imitate catalog cards. Yeah -- this is kinda the source of my confusion. In the case of the files I'm reading, it's not that it's hard to find out where the nMeasurement field lives (it's six short ints starting at offset 64), but what the field means, and whether or not I care about it. Switching to an XML format doesn't help with that at all. WRT character encoding issues and validation: if MARC and MARCXML are round-trippable, a solution in one environment is equivalent to a solution in the other. And I think we've all seen plenty of unvalidated, badly-formed XML, and plenty with Character Encoding Problemsâ„¢ ;-) Thanks for the input! -Nate
Re: [CODE4LIB] MARCXML - What is it for?
Yes, it is designed to be a round-trippable expression of ordinary marc in XML. Some reasons this is useful: 1. No maximum record length, unlike actual marc which tops out at ~10k. 2. You can use XSLT and other XML tools to work with it, and store it in stores optimized for XML (or that only accept XML), etc. 3. You can embed it inside XML schema's that allow arbitrary embeddable XML. 4. (Of much lesser importance than these others, but still ends up being important to me -- saving the time of the developer does matter) it's a lot easier to debug the raw data, doesn't require me to open up a hex editor and count bytes. Nate Vack wrote: Hi all, I've just spent the last couple of weeks delving into and decoding a binary file format. This, in turn, got me thinking about MARCXML. In a nutshell, it looks like it's supposed to contain the exact same data as a normal MARC record, except in XML form. As in, it should be round-trippable. What's the advantage to this? I can see using a human-readable format for poorly-documented file formats -- they're relatively easy to read and understand. But MARC is well, well-documented, with more than one free implementation in cursory searching. And once you know a binary file's format, it's no harder to parse than XML, and the data's smaller and processing faster. So... why the XML? Curious, -Nate
Re: [CODE4LIB] MARCXML - What is it for?
On Mon, Oct 25, 2010 at 12:22 PM, Eric Hellman wrote: > I think you'd have a very hard time demonstrating any speed advantage to > MARC over MARCXML. XML parsers have been speed optimized out the wazoo; If > there exists a MARC parser that has ever been speed-optimized without > serious compromise, I'm sure someone on this list will have a good story > about it. I'll take MarcEdit over a XML parser for MARCXML any day. For a benchmark test, try roundtripping a million records. Unless I've been messing with the wrong stuff, the differences are dramatic. kyle
Re: [CODE4LIB] MARCXML - What is it for?
Does processing speed of something matter anymore? You'd have to be doing a LOT of processing to care, wouldn't you? Tim On Mon, Oct 25, 2010 at 3:35 PM, MJ Suhonos wrote: > I'll just leave this here: > > http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records > > That trade-off ought to offend both camps, though I happen to think it's > quite clever. > > MJ > > On 2010-10-25, at 3:22 PM, Eric Hellman wrote: > >> I think you'd have a very hard time demonstrating any speed advantage to >> MARC over MARCXML. XML parsers have been speed optimized out the wazoo; If >> there exists a MARC parser that has ever been speed-optimized without >> serious compromise, I'm sure someone on this list will have a good story >> about it. >> >> On Oct 25, 2010, at 3:05 PM, Patrick Hochstenbach wrote: >> >>> Dear Nate, >>> >>> There is a trade-off: do you want very fast processing of data -> go for >>> binary data. do you want to share your data globally easily in many (not >>> per se library related) environments -> go for XML/RDF. >>> Open your data and do both :-) >>> >>> Pat >>> >>> Sent from my iPhone >>> >>> On 25 Oct 2010, at 20:39, "Nate Vack" wrote: >>> Hi all, I've just spent the last couple of weeks delving into and decoding a binary file format. This, in turn, got me thinking about MARCXML. In a nutshell, it looks like it's supposed to contain the exact same data as a normal MARC record, except in XML form. As in, it should be round-trippable. What's the advantage to this? I can see using a human-readable format for poorly-documented file formats -- they're relatively easy to read and understand. But MARC is well, well-documented, with more than one free implementation in cursory searching. And once you know a binary file's format, it's no harder to parse than XML, and the data's smaller and processing faster. So... why the XML? Curious, -Nate >> >> Eric Hellman >> President, Gluejar, Inc. >> 41 Watchung Plaza, #132 >> Montclair, NJ 07042 >> USA >> >> e...@hellman.net >> http://go-to-hellman.blogspot.com/ >> @gluejar > -- Check out my library at http://www.librarything.com/profile/timspalding
Re: [CODE4LIB] MARCXML - What is it for?
On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding wrote: > Does processing speed of something matter anymore? You'd have to be > doing a LOT of processing to care, wouldn't you? > Data migrations and data dumps are a common use case. Needing to break or make hundreds of thousands or millions of records is not uncommon. kyle
Re: [CODE4LIB] MARCXML - What is it for?
I'll just leave this here: http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records That trade-off ought to offend both camps, though I happen to think it's quite clever. MJ On 2010-10-25, at 3:22 PM, Eric Hellman wrote: > I think you'd have a very hard time demonstrating any speed advantage to MARC > over MARCXML. XML parsers have been speed optimized out the wazoo; If there > exists a MARC parser that has ever been speed-optimized without serious > compromise, I'm sure someone on this list will have a good story about it. > > On Oct 25, 2010, at 3:05 PM, Patrick Hochstenbach wrote: > >> Dear Nate, >> >> There is a trade-off: do you want very fast processing of data -> go for >> binary data. do you want to share your data globally easily in many (not per >> se library related) environments -> go for XML/RDF. >> Open your data and do both :-) >> >> Pat >> >> Sent from my iPhone >> >> On 25 Oct 2010, at 20:39, "Nate Vack" wrote: >> >>> Hi all, >>> >>> I've just spent the last couple of weeks delving into and decoding a >>> binary file format. This, in turn, got me thinking about MARCXML. >>> >>> In a nutshell, it looks like it's supposed to contain the exact same >>> data as a normal MARC record, except in XML form. As in, it should be >>> round-trippable. >>> >>> What's the advantage to this? I can see using a human-readable format >>> for poorly-documented file formats -- they're relatively easy to read >>> and understand. But MARC is well, well-documented, with more than one >>> free implementation in cursory searching. And once you know a binary >>> file's format, it's no harder to parse than XML, and the data's >>> smaller and processing faster. >>> >>> So... why the XML? >>> >>> Curious, >>> -Nate > > Eric Hellman > President, Gluejar, Inc. > 41 Watchung Plaza, #132 > Montclair, NJ 07042 > USA > > e...@hellman.net > http://go-to-hellman.blogspot.com/ > @gluejar
Re: [CODE4LIB] MARCXML - What is it for?
I guess what I meant is that in MARCXML, you have a element with subsequent elements each with fairly clear attributes, which, while not my idea of fun Sunday-afternoon reading, requires less specialized tools to parse (hello Textmate!) and is a bit easier than trying to count INT positions. One quick XPath query and you can have all 245 fields, regardless of their length or position in the record. On 2010-10-25, at 3:26 PM, Nate Vack wrote: > On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding wrote: >> - XML is self-describing, binary is not. >> >> Not to quibble, but that's only in a theoretical sense here. Something >> like Amazon XML is truly self-describing. MARCXML is self-obfuscating. >> At least MARC records kinda imitate catalog cards. > > Yeah -- this is kinda the source of my confusion. In the case of the > files I'm reading, it's not that it's hard to find out where the > nMeasurement field lives (it's six short ints starting at offset 64), > but what the field means, and whether or not I care about it. > > Switching to an XML format doesn't help with that at all. > > WRT character encoding issues and validation: if MARC and MARCXML are > round-trippable, a solution in one environment is equivalent to a > solution in the other. > > And I think we've all seen plenty of unvalidated, badly-formed XML, > and plenty with Character Encoding Problemsâ„¢ ;-) > > Thanks for the input! > -Nate
Re: [CODE4LIB] MARCXML - What is it for?
Hiya, On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack wrote: > Switching to an XML format doesn't help with that at all. I'm willing to take it further and say that MARCXML was the worst thing the library world ever did. Some might argue it was a good first step, and that it was better with something rather than nothing, to which I respond ; Poppycock! MARCXML is nothing short of evil. Not only does it goes against every principal of good XML anywhere (don't rely on whitespace, structure over code, namespace conventions, identity management, document control, separation of entities and properties, and on and on), it breaks the ontological commitment that a better treatment of the MARC data could bring, deterring people from actually a) using the darn thing as anything but a bare minimal crutch, and b) expanding it to be actual useful and interesting. The quicker the library world can get rid of this monstrosity, the better, although I doubt that will ever happen; it will hang around like a foul stench for as long as there is MARC in the world. A long time. A long sad time. A few extra notes; http://shelterit.blogspot.com/2008/09/marcxml-beast-of-burden.html Can you tell I'm not a fan? :) Kind regards, Alex -- Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps --- http://shelter.nu/blog/ -- -- http://www.google.com/profiles/alexander.johannesen ---
Re: [CODE4LIB] MARCXML - What is it for?
On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding wrote: > - XML is self-describing, binary is not. > > Not to quibble, but that's only in a theoretical sense here. Something > like Amazon XML is truly self-describing. MARCXML is self-obfuscating. > At least MARC records kinda imitate catalog cards. Yeah -- this is kinda the source of my confusion. In the case of the files I'm reading, it's not that it's hard to find out where the nMeasurement field lives (it's six short ints starting at offset 64), but what the field means, and whether or not I care about it. Switching to an XML format doesn't help with that at all. WRT character encoding issues and validation: if MARC and MARCXML are round-trippable, a solution in one environment is equivalent to a solution in the other. And I think we've all seen plenty of unvalidated, badly-formed XML, and plenty with Character Encoding Problemsâ„¢ ;-) Thanks for the input! -Nate
Re: [CODE4LIB] MARCXML - What is it for?
I think you'd have a very hard time demonstrating any speed advantage to MARC over MARCXML. XML parsers have been speed optimized out the wazoo; If there exists a MARC parser that has ever been speed-optimized without serious compromise, I'm sure someone on this list will have a good story about it. On Oct 25, 2010, at 3:05 PM, Patrick Hochstenbach wrote: > Dear Nate, > > There is a trade-off: do you want very fast processing of data -> go for > binary data. do you want to share your data globally easily in many (not per > se library related) environments -> go for XML/RDF. > Open your data and do both :-) > > Pat > > Sent from my iPhone > > On 25 Oct 2010, at 20:39, "Nate Vack" wrote: > >> Hi all, >> >> I've just spent the last couple of weeks delving into and decoding a >> binary file format. This, in turn, got me thinking about MARCXML. >> >> In a nutshell, it looks like it's supposed to contain the exact same >> data as a normal MARC record, except in XML form. As in, it should be >> round-trippable. >> >> What's the advantage to this? I can see using a human-readable format >> for poorly-documented file formats -- they're relatively easy to read >> and understand. But MARC is well, well-documented, with more than one >> free implementation in cursory searching. And once you know a binary >> file's format, it's no harder to parse than XML, and the data's >> smaller and processing faster. >> >> So... why the XML? >> >> Curious, >> -Nate Eric Hellman President, Gluejar, Inc. 41 Watchung Plaza, #132 Montclair, NJ 07042 USA e...@hellman.net http://go-to-hellman.blogspot.com/ @gluejar
Re: [CODE4LIB] MARCXML - What is it for?
On Monday, October 25, 2010 1:50 PM, Andrew Hankinson wrote: >- Documents can be validated for their "well-formedness" using these existing >tools and a pre-defined schema (a validator for MARC would need to be >custom-coded) In Perl, MARC::Lint might be an example of such a validator (though I need to update it with the most recent MARC updates at some point soon). MarcEdit also includes a validator. Bryan Baldus bryan.bal...@quality-books.com eij...@cpan.org http://home.comcast.net/~eijabb/
Re: [CODE4LIB] MARCXML - What is it for?
- XML is self-describing, binary is not. Not to quibble, but that's only in a theoretical sense here. Something like Amazon XML is truly self-describing. MARCXML is self-obfuscating. At least MARC records kinda imitate catalog cards. :) Tim On Mon, Oct 25, 2010 at 2:50 PM, Andrew Hankinson wrote: > I'm not a big user of MARCXML, but I can think of a few reasons off the top > of my head: > > - Existing libraries for reading, manipulating and searching XML-based > documents are very mature. > - Documents can be validated for their "well-formedness" using these existing > tools and a pre-defined schema (a validator for MARC would need to be > custom-coded) > - MARCXML can easily be incorporated into XML-based meta-metadata schemas, > like METS. > - It can be parsed and manipulated in a web service context without sending a > binary blob over the wire. > - XML is self-describing, binary is not. > > There's nothing stopping you from reading the MARCXML into a binary blob and > working on it from there. But when sharing documents from different > institutions around the globe, using a wide variety of tools and techniques, > XML seems to be the lowest common denominator. > > -Andrew > > On 2010-10-25, at 2:38 PM, Nate Vack wrote: > >> Hi all, >> >> I've just spent the last couple of weeks delving into and decoding a >> binary file format. This, in turn, got me thinking about MARCXML. >> >> In a nutshell, it looks like it's supposed to contain the exact same >> data as a normal MARC record, except in XML form. As in, it should be >> round-trippable. >> >> What's the advantage to this? I can see using a human-readable format >> for poorly-documented file formats -- they're relatively easy to read >> and understand. But MARC is well, well-documented, with more than one >> free implementation in cursory searching. And once you know a binary >> file's format, it's no harder to parse than XML, and the data's >> smaller and processing faster. >> >> So... why the XML? >> >> Curious, >> -Nate > -- Check out my library at http://www.librarything.com/profile/timspalding
Re: [CODE4LIB] MARCXML - What is it for?
It's helpful to think of MARCXML as a sort of lingua franca. > - Existing libraries for reading, manipulating and searching XML-based > documents are very mature. Including XSLT and XPath; very powerful stuff. > There's nothing stopping you from reading the MARCXML into a binary blob and > working on it from there. But when sharing documents from different > institutions around the globe, using a wide variety of tools and techniques, > XML seems to be the lowest common denominator. Assuming it's also round-trippable, MARC-in-JSON would accomplish this as well. Not to mention it's nice to be able to read and edit MARC records in any (any!!) text editor for those of us who are comfortable looking at JSON or XML but can't handle staring at binary bytestreams without having an aneurysm. MJ > On 2010-10-25, at 2:38 PM, Nate Vack wrote: > >> Hi all, >> >> I've just spent the last couple of weeks delving into and decoding a >> binary file format. This, in turn, got me thinking about MARCXML. >> >> In a nutshell, it looks like it's supposed to contain the exact same >> data as a normal MARC record, except in XML form. As in, it should be >> round-trippable. >> >> What's the advantage to this? I can see using a human-readable format >> for poorly-documented file formats -- they're relatively easy to read >> and understand. But MARC is well, well-documented, with more than one >> free implementation in cursory searching. And once you know a binary >> file's format, it's no harder to parse than XML, and the data's >> smaller and processing faster. >> >> So... why the XML? >> >> Curious, >> -Nate
Re: [CODE4LIB] MARCXML - What is it for?
Dear Nate, There is a trade-off: do you want very fast processing of data -> go for binary data. do you want to share your data globally easily in many (not per se library related) environments -> go for XML/RDF. Open your data and do both :-) Pat Sent from my iPhone On 25 Oct 2010, at 20:39, "Nate Vack" wrote: > Hi all, > > I've just spent the last couple of weeks delving into and decoding a > binary file format. This, in turn, got me thinking about MARCXML. > > In a nutshell, it looks like it's supposed to contain the exact same > data as a normal MARC record, except in XML form. As in, it should be > round-trippable. > > What's the advantage to this? I can see using a human-readable format > for poorly-documented file formats -- they're relatively easy to read > and understand. But MARC is well, well-documented, with more than one > free implementation in cursory searching. And once you know a binary > file's format, it's no harder to parse than XML, and the data's > smaller and processing faster. > > So... why the XML? > > Curious, > -Nate
Re: [CODE4LIB] MARCXML - What is it for?
I'm not a big user of MARCXML, but I can think of a few reasons off the top of my head: - Existing libraries for reading, manipulating and searching XML-based documents are very mature. - Documents can be validated for their "well-formedness" using these existing tools and a pre-defined schema (a validator for MARC would need to be custom-coded) - MARCXML can easily be incorporated into XML-based meta-metadata schemas, like METS. - It can be parsed and manipulated in a web service context without sending a binary blob over the wire. - XML is self-describing, binary is not. There's nothing stopping you from reading the MARCXML into a binary blob and working on it from there. But when sharing documents from different institutions around the globe, using a wide variety of tools and techniques, XML seems to be the lowest common denominator. -Andrew On 2010-10-25, at 2:38 PM, Nate Vack wrote: > Hi all, > > I've just spent the last couple of weeks delving into and decoding a > binary file format. This, in turn, got me thinking about MARCXML. > > In a nutshell, it looks like it's supposed to contain the exact same > data as a normal MARC record, except in XML form. As in, it should be > round-trippable. > > What's the advantage to this? I can see using a human-readable format > for poorly-documented file formats -- they're relatively easy to read > and understand. But MARC is well, well-documented, with more than one > free implementation in cursory searching. And once you know a binary > file's format, it's no harder to parse than XML, and the data's > smaller and processing faster. > > So... why the XML? > > Curious, > -Nate
Re: [CODE4LIB] MARCXML - What is it for?
MARC records break parsing far too frequently. Apart from requiring no truly specialized tools, MARCXML should—should!—eliminate many of those problems. That's not to mention that MARC character sets vary a lot (DanMARC anyone?), and more even in practice than in theory. >From my perspective the problem is simply that MARCXML isn't as ubiquitous as MARC. For what we do, at least, there's no point. We'd need to parse non-XML MARC data anyway. So if we're going to do it, we might as well do it for everything. Best, Tim On Mon, Oct 25, 2010 at 2:38 PM, Nate Vack wrote: > Hi all, > > I've just spent the last couple of weeks delving into and decoding a > binary file format. This, in turn, got me thinking about MARCXML. > > In a nutshell, it looks like it's supposed to contain the exact same > data as a normal MARC record, except in XML form. As in, it should be > round-trippable. > > What's the advantage to this? I can see using a human-readable format > for poorly-documented file formats -- they're relatively easy to read > and understand. But MARC is well, well-documented, with more than one > free implementation in cursory searching. And once you know a binary > file's format, it's no harder to parse than XML, and the data's > smaller and processing faster. > > So... why the XML? > > Curious, > -Nate > -- Check out my library at http://www.librarything.com/profile/timspalding
[CODE4LIB] MARCXML - What is it for?
Hi all, I've just spent the last couple of weeks delving into and decoding a binary file format. This, in turn, got me thinking about MARCXML. In a nutshell, it looks like it's supposed to contain the exact same data as a normal MARC record, except in XML form. As in, it should be round-trippable. What's the advantage to this? I can see using a human-readable format for poorly-documented file formats -- they're relatively easy to read and understand. But MARC is well, well-documented, with more than one free implementation in cursory searching. And once you know a binary file's format, it's no harder to parse than XML, and the data's smaller and processing faster. So... why the XML? Curious, -Nate
Re: [CODE4LIB] Help with DLF-ILS GetAvailability
Emily Lynema wrote: standardized metadata! While we had envisioned using something like MARCXML or ISO Holdings here to express things like serial runs, there Kind of a side note, but please consider ONIX Serial Holdings for expressing serial runs! It is by far the best schema I've seen for doing this -- simple for simple cases, flexible for other cases, actually DOES express things in a machine-interpretable way. Everything else I've seen is both way too complicated, even for simple cases, and often ends up expressing holdings in a way that a machine can't act upon anyway.
Re: [CODE4LIB] Help with DLF-ILS GetAvailability
I agree with Jonathan and David. The only reason there are no examples of including within is because no one thought of a use case for why you would do that. The xsd for explicitly states that it is simply "Metadata must be expressed in XML that complies with another XML Schema (namespace=#other). Metadata must be explicitly qualified in the response." So the only restriction is that it's some kind of standardized metadata! While we had envisioned using something like MARCXML or ISO Holdings here to express things like serial runs, there is no reason that simpleavailability could not be employed to describe a different kind of collection of items. The and are after all intended to represent a collection of items, and as David points out, the ISO Holdings schema explicitly allows for collection-level availability summary. And I will also note that ISO Holdings certainly does express availability in addition to 'holdings'; they are really one and the same thing. I guess I should note that I was a member of the original DLF group, so I suppose this is a fairly authoritative perspective on the original intent of the elements. :) -emily -- Date: Thu, 21 Oct 2010 16:26:54 -0400 From: Jonathan Rochkind Subject: Re: Help with DLF-ILS GetAvailability I don't think that's an abuse. I consider to be for information about a "holdingset", or some collection of "items", while is for information about an individual item. I think regardless of what you do you are being over-optimistic in thinking that if you just "do dlf", your stuff will interchangeable with any other clients or servers "doing dlf". The spec is way too open-ended for that, it leaves a whole bunch of details not specified and up to the implementer. For better or worse. I made more comments about this in the blog post I referenced earlier. Jonathan Owen Stephens wrote: > Thanks Dave, > > Yes - my reading was that dlf:holdings was for pure 'holdings' as opposed to > 'availability'. We could put the simpleavailability in there I guess but as > you say since we are controlling both ends then there doesn't seem any point > in abusing it like that. The downside is we'd hoped to do something that > could be taken by other sites - the original plan was to use the Juice > framework - developed by Talis using jQuery to parse a standard availability > format so that this could then be applied easily in other environments. > Obviously we can still achieve the outcome we need for the immediate > requirements of the project by using a custom format. > > Thanks again > > Owen > > > On Thu, Oct 21, 2010 at 4:28 PM, Walker, David wrote: > > >> Hey Owen, >> >> Seems like the you could use the element to hold this kind >> of individual library information. >> >> The DLF-ILS documentation doesn't seem to think that you would use >> dlf:simpleavailability here, though, but rather MARC or ISO holdings >> schemas. >> >> But if you're controlling both ends of the communication, I don't know if >> it really matters. >> >> --Dave >> >> == >> David Walker >> Library Web Services Manager >> California State University >> http://xerxes.calstate.edu >> >> From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Owen >> Stephens [o...@ostephens.com] >> Sent: Wednesday, October 20, 2010 12:22 PM >> To: CODE4LIB@LISTSERV.ND.EDU >> Subject: [CODE4LIB] Help with DLF-ILS GetAvailability >> >> I'm working with the University of Oxford to look at integrating some >> library services into their VLE/Learning Management System (Sakai). One of >> the services is something that will give availability for items on a reading >> list in the VLE (the Sakai 'Citation Helper'), and I'm looking at the >> DLF-ILS GetAvailability specification to achieve this. >> >> For physical items, the availability information I was hoping to use is >> expressed at the level of a physical collection. For example, if several >> college libraries within the University I have aggregated information that >> tells me the availability of the item in each of the college libraries. >> However, I don't have item level information. >> >> I can see how I can use simpleavailability to say over the entire >> institution whether (e.g.) a book is available or not. However, I'm not >> clear I can express this in a more granular way (say availability on a >> library by library basis) except by going to item level. Also although it >> seems you can express multiple locations in simpleavailability, and multiple >> availabilitymsg, there is no way I can see to link these, so although I >> could list each location OK, I can't attach an availabilitymsg to a specific >> location (unless I only express one location). >> >> Am I missing something, or is my interpretation correct? >> >> Any other suggestions? >> >> Thanks, >> >> Owen >> >> PS also looked at DAIA which I like, but this (as far as I can tel
[CODE4LIB] testing testing testing - Solr indexing software
I just finished a bunch of blog posts about the sorts of tests to write for Solr indexing software. Comments are welcome. Try not to drool when you fall asleep on your keyboard. Start with this one: http://discovery-grindstone.blogspot.com/2010/10/testing-solr-indexing-software.html - Naomi
[CODE4LIB] (LC) call number searching in Solr
I recently set up a testing framework allowing me to twiddle Solr knobs until I met acceptance criteria for LC call number searching. I came up with two Solr field types that worked for my criteria. You can read all about it here: http://discovery-grindstone.blogspot.com/2010/10/lc-call-number-searching-in-solr.html - Naomi
Re: [CODE4LIB] Django
On Mon, Oct 25, 2010 at 9:19 AM, Junior Tidal wrote: > Does anyone have any recommendations for learning Django? Books, websites, > video tutorials, etc. ... For resources, "learn django" in Google shows a bunch of promising hints. Methodology-wise: Start with a fairly concrete, well-defined problem. Have a product in mind before you start. Work hard with the tool you choose to make your product. Don't stress about whether you've chosen the best tools (you haven't) or whether you're doing it perfectly (you aren't). Make the thing. You can spend months looking over example code and tutorials and blog posts and not learn nearly as much as you would attacking the problem. Plus, you've gotten closer to solving the problem as you've learned. Or, DHH says it a bit better: http://37signals.com/svn/posts/2582-how-do-i-learn-to-program Cheers, -Nate
Re: [CODE4LIB] VPN vs. Proxy - Quick Question
We have VPN and Proxy(III WAM) available here although for our online resources VPN doesn't get you anything special you still go through proxy. The regular URLs and Proxy URLs are in a PostgreSQL database and the page with the links to online resources is dynamically fed based on your IP (HTTP variable HTTP_X_FORWARDED_FOR). Apache forwards all requests to Zope server so that's why I'm not checking REMOTE_ADDR variable. If your IP is not in our domain, that is if the first two octets don't match, then you get a proxy link which goes to our III authentication page. Online resources that are free get the same URL for on campus and off campus not a PROXY link. I use a simple python script to check the http variable 'HTTP_X_FORWARDED_FOR' and return 0 or 1 in the variable 'hostname' to a Zope (python based WEB server) page. A simple IF conditional statement determines which URL to display based on the return value of the script. # call the python script ip_add_flag and set the return value to the variable hostname (opens in a new window) The campus offers a VPN service but you don't get the usual campus domain IP so we handle it the same as if it is any other off campus IP, our vendors are not given this range either so it is not in the group of IPs for licensing certain databases. As far as user complaints, we have a form that a small group of people here receive those submissions and they put it into TRAC and individually work through the issues. Don't know the ratio of Proxy:VPN users, I don't have a definitive range of VPN IPs to work with. The campus VPN is used to be able to access certain servers that are not normally accessible off campus because the vlan they are in. Thomas On Monday 25 October 2010 09:33:55 Tim McGeary wrote: > Hi all, > > I realize that some of you may not directly deal with this issue, but I > was wondering if I could get some quick replies about how your > institutions are handling access to off-campus resources via VPN and Proxy. > > Do you offer a VPN service? If so, do you split-tunnel the traffic so > that the VPN only handles traffic to inside your campus IP? If you > split-tunnel, do users complain about not being able to connect to > external library resources (databases, journals, etc)? > > Do you offer a Proxy service? Will your proxy service work for users > already connected to VPN? > > Do you know an estimated ratio of Proxy:VPN users? > > Thanks, > Tim > -- == Thomas McMillan Grant Bennett Appalachian State University Operations & Systems AnalystP O Box 32026 University LibraryBoone, North Carolina 28608 (828) 262 6587 Library Systems Help Desk: https://www.library.appstate.edu/help/ ==
Re: [CODE4LIB] Django
I'd start here: http://docs.djangoproject.com/en/1.2/ There are some tutorials in there as well. -Mike On Mon, Oct 25, 2010 at 10:19, Junior Tidal wrote: > Hello Code4Lib, > > Does anyone have any recommendations for learning Django? Books, websites, > video tutorials, etc. ... > > thanks, > > Junior Tidal > Assistant Professor > Web Services and Multimedia Librarian > New York City College of Technology, CUNY > 300 Jay Street > Brooklyn, NY 11210 > 718.260.5481 > > http://library.citytech.cuny.edu >
Re: [CODE4LIB] Django
There's the Django Book: http://www.djangobook.com/ (Make sure you choose the revised edition for 1.0) The Django docs, with some intro tutorials: http://docs.djangoproject.com/en/1.2/ Did you try those already? On 2010-10-25, at 10:19 AM, Junior Tidal wrote: > Hello Code4Lib, > > Does anyone have any recommendations for learning Django? Books, websites, > video tutorials, etc. ... > > thanks, > > Junior Tidal > Assistant Professor > Web Services and Multimedia Librarian > New York City College of Technology, CUNY > 300 Jay Street > Brooklyn, NY 11210 > 718.260.5481 > > http://library.citytech.cuny.edu
[CODE4LIB] Django
Hello Code4Lib, Does anyone have any recommendations for learning Django? Books, websites, video tutorials, etc. ... thanks, Junior Tidal Assistant Professor Web Services and Multimedia Librarian New York City College of Technology, CUNY 300 Jay Street Brooklyn, NY 11210 718.260.5481 http://library.citytech.cuny.edu
Re: [CODE4LIB] Simple Flexible ILS written in Django
Neat, if you put this into production at a public URL anytime, do let us know. Elliot Hallmark wrote: Re: simple, flexible ILS for small library hello all, Just wanted to mention that I did decide to code an ILS for a book sharing library. Tweaking conventional ILS or bartering software would not accomplish what I want, and programatically the problem isn't very difficult. Modern programming frameworks make building something like this out very quick. The program currently has all the basic functionality I need including user front ends for checkin/out, adding new items from downloaded MARC records and a powerful backend for fixing anything that a user (librarian) did accidentally or shouldnt have permission to easily do. I am doing this in Django because Python is awesome. There are more defined reasons people use Django, but I will admit that I would never have done any substantial programming if I had not found python. Everytime a project comes up, python just happens to have a set of tools that go well beyond what I need from them. The code is available for anyone: http://bitbucket.org/permafacture/django-ils/ I'll gladly explain/document more if anyone cares to hear -Elliot Hallmark PS: I am now using a more clean email address. Previously I was using an address that I dont mind getting a little spammy because I was just poking around. I am the same person as offonoffoffonoff at gmail.
[CODE4LIB] VPN vs. Proxy - Quick Question
Hi all, I realize that some of you may not directly deal with this issue, but I was wondering if I could get some quick replies about how your institutions are handling access to off-campus resources via VPN and Proxy. Do you offer a VPN service? If so, do you split-tunnel the traffic so that the VPN only handles traffic to inside your campus IP? If you split-tunnel, do users complain about not being able to connect to external library resources (databases, journals, etc)? Do you offer a Proxy service? Will your proxy service work for users already connected to VPN? Do you know an estimated ratio of Proxy:VPN users? Thanks, Tim -- Tim McGeary Team Leader, Library Technology Lehigh University 610-758-4998 tim.mcge...@lehigh.edu timmcge...@gmail.com GTalk/Yahoo/Skype: timmcgeary
[CODE4LIB] IR+ 2.0 now available for download
The University of Rochester is pleased to announce the open source release of its institutional repository IR+. Following a successful production launch on Tuesday October 12th, IR+ 2.0 is now available to the entire community. IR+ was born from user research. With portfolios, personal workspaces, and publication listings, it offers useful tools for researchers and extends the role of the repository into the authoring process. It is a fully featured digital repository management solution that is easy for users to understand and manage. Its goals are to meet the needs of any organization that needs to author, publish and preserve digital information. The download and documentation can be found at http://www.irplus.org The new version has many new features and updates. These include: - OAI-PMH harvestable - Dublin Core mapping features for Identifiers and contributors - Improved batch metadata manipulation - automated re-indexing enhancements (changing control lists forces re-indexing of all items that use changed data) - Sponsor browsing / statistics - Paging and Sorting for contributor pages - Improved Search Engine Optimization(SEO) for better indexing of researcher pages and content within the repository - Researcher page interface enhancements - Content type listing and filtering at the repository and collection levels - Content type counts at the repository and collection levels - Increased download information and removal options for more accurate download counts - Updated Help, Installation and User manuals - RSS feeds for Collections/Contributor Pages - Upgraded pdf/word/excel/power point text extraction libraries - Updated user account management features - Submission performance enhancements - Improved home page module placement - Improved change tracking We are pleased with the current faculty and student interest in our IR+ installation, UR Research, having over two thousand registered users, over one million downloads and thirty six public researcher pages. Please feel free to contact me if you have any questions. Nathan Sarr Senior Software Engineer River Campus Libraries University of Rochester Rochester, NY 14627 (585) 275-0692