Re: [CODE4LIB] MARCXML - What is it for?

2010-11-19 Thread Smith,Devon
for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Nate Vack Sent: Friday, November 19, 2010 12:34 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARCXML - What is it for? On Mon, Oct 25, 2010 at 2:22 PM, Eric Hellman e...@hellman.net wrote: I think you'd have a very hard time

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-28 Thread Cory Rockliff
I've only just had a chance to catch up on this thread. I'm not offended in the least by Turbomarc (anything round-trippable should serve just as well as an internal representation of MARC, right?), but I am a little puzzled--what are the 'special cases' alluded to in the blog post? When

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-28 Thread MJ Suhonos
Let me openly state that I've never used Turbomarc. I believe the special case they are referring to is the subfield code with a value of η, which is non-alphanumeric. I don't know enough about MARC to even begin guessing what this means or why it might occur (or not). The use case I see for

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-28 Thread Mike Taylor
On 28 October 2010 17:37, MJ Suhonos m...@suhonos.ca wrote: Let me openly state that I've never used Turbomarc.  I believe the special case they are referring to is the subfield code with a value of η, which is non-alphanumeric.  I don't know enough about MARC to even begin guessing what

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-28 Thread MJ Suhonos
The first comment claims a 30-40% increase in XML parsing, which seems obvious when you compare the number of characters in the example provided: 277 vs. 419, or about 34% fewer going through the parser. The speedup can be much greater than that -- from the blog post itself, Using

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Walker, David
Johannesen [alexander.johanne...@gmail.com] Sent: Monday, October 25, 2010 7:10 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARCXML - What is it for? On Tue, Oct 26, 2010 at 12:48 PM, Bill Dueber b...@dueber.com wrote: Here, I think you're guilty of radically underestimating lots

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Smith,Devon
[mailto:code4...@listserv.nd.edu] On Behalf Of Walker, David Sent: Monday, October 25, 2010 8:57 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARCXML - What is it for? b) expanding it to be actual useful and interesting. But here I think you've missed the very utility of MARC-XML. Let's say you

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread MJ Suhonos
But it looks just like the old thing using insert data scheme and some templates? Ah yes, but now we're doing it in XML! I think this applies to 90% of instances where XML was adopted, especially within the enterprise IT industry. Through marketing or misunderstanding, XML was

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Ross Singer
Alex, I think the problem is data like this: http://lccn.loc.gov/96516389/marcxml And while we can probably figure out a pattern to get the semantics out this record, there is no telling how many other variations exist within our collections. So we've got lots of this data that is both hard to

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Kyle Banerjee
This is no justification for not doing things better. (And I'd love to know what the hard bits are; always interesting to hear from various people as to what they think are the *real* problems of library problems, as opposed to any other problem they have) The problem is you have to deal

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Alexander Johannesen
Hi, On Tue, Oct 26, 2010 at 1:23 PM, Bill Dueber b...@dueber.com wrote: Sorry. That was rude, and uncalled for. I disagree that the problem is easily solved, even without the politics. There've been lots of attempts to try to come up with a sufficiently expressive toolset for dealing with

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Toke Eskildsen
On Tue, 2010-10-26 at 03:32 +0200, Alexander Johannesen wrote: Here's our new thing. And we did it by simply converting all our MARC into MARCXML that runs on a cron job every midnight, and a bit of horrendous XSLT that's impossible to maintain. I am in the development department of our

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Boheemen, Peter van
I think: 1. Marc must die. It has lived long enough. 2. But everybody uses Marc (which is in fact good), too many people are keeping it alive. 3. MARC in XML does not solve the problem, but it makes the suffering so much less painful Peter

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Alexander Johannesen
Political? For sure. Engineering? Not so much. Ok. Solve it. Let us know when you're done. Wow, lamest reply so far. Surely you could muster a tad bit better? I was excited about getting a list of the hardest problems, for example, I'd love to see that. Then by that perhaps you could explain

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Andrew Cunningham
I'd suspect that MARCXML isn't going anywhere fast, a shame perhaps. The key difference between MARCXML and MARC is that MARCXML inherits XMLs internationalisation features. It is an aspect at which MARC is very poor. Andrew -- Andrew Cunningham Senior Project Manager, Research and

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Richard, Joel M
On Oct 25, 2010, at 10:31 PM, Alexander Johannesen wrote: Political? For sure. Engineering? Not so much. Ok. Solve it. Let us know when you're done. Wow, lamest reply so far. Surely you could muster a tad bit better? I was excited about getting a list of the hardest problems, for example,

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-27 Thread Walker, David
Of Smith,Devon [smit...@oclc.org] Sent: Tuesday, October 26, 2010 7:44 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARCXML - What is it for? One way is to first transform the MARC into MARC-XML. Then you can use XSLT to crosswalk the MARC-XML into that other schema. Very handy. Your

[CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Nate Vack
Hi all, I've just spent the last couple of weeks delving into and decoding a binary file format. This, in turn, got me thinking about MARCXML. In a nutshell, it looks like it's supposed to contain the exact same data as a normal MARC record, except in XML form. As in, it should be

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Tim Spalding
MARC records break parsing far too frequently. Apart from requiring no truly specialized tools, MARCXML should—should!—eliminate many of those problems. That's not to mention that MARC character sets vary a lot (DanMARC anyone?), and more even in practice than in theory. From my perspective the

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Andrew Hankinson
I'm not a big user of MARCXML, but I can think of a few reasons off the top of my head: - Existing libraries for reading, manipulating and searching XML-based documents are very mature. - Documents can be validated for their well-formedness using these existing tools and a pre-defined schema

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Patrick Hochstenbach
Dear Nate, There is a trade-off: do you want very fast processing of data - go for binary data. do you want to share your data globally easily in many (not per se library related) environments - go for XML/RDF. Open your data and do both :-) Pat Sent from my iPhone On 25 Oct 2010, at 20:39,

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread MJ Suhonos
It's helpful to think of MARCXML as a sort of lingua franca. - Existing libraries for reading, manipulating and searching XML-based documents are very mature. Including XSLT and XPath; very powerful stuff. There's nothing stopping you from reading the MARCXML into a binary blob and working

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Tim Spalding
- XML is self-describing, binary is not. Not to quibble, but that's only in a theoretical sense here. Something like Amazon XML is truly self-describing. MARCXML is self-obfuscating. At least MARC records kinda imitate catalog cards. :) Tim On Mon, Oct 25, 2010 at 2:50 PM, Andrew Hankinson

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Bryan Baldus
On Monday, October 25, 2010 1:50 PM, Andrew Hankinson wrote: - Documents can be validated for their well-formedness using these existing tools and a pre-defined schema (a validator for MARC would need to be custom-coded) In Perl, MARC::Lint might be an example of such a validator (though I

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Eric Hellman
I think you'd have a very hard time demonstrating any speed advantage to MARC over MARCXML. XML parsers have been speed optimized out the wazoo; If there exists a MARC parser that has ever been speed-optimized without serious compromise, I'm sure someone on this list will have a good story

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Nate Vack
On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding t...@librarything.com wrote: - XML is self-describing, binary is not. Not to quibble, but that's only in a theoretical sense here. Something like Amazon XML is truly self-describing. MARCXML is self-obfuscating. At least MARC records kinda imitate

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Alexander Johannesen
Hiya, On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote: Switching to an XML format doesn't help with that at all. I'm willing to take it further and say that MARCXML was the worst thing the library world ever did. Some might argue it was a good first step, and that it was better

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Andrew Hankinson
I guess what I meant is that in MARCXML, you have a datafield element with subsequent subfield elements each with fairly clear attributes, which, while not my idea of fun Sunday-afternoon reading, requires less specialized tools to parse (hello Textmate!) and is a bit easier than trying to

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread MJ Suhonos
I'll just leave this here: http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records That trade-off ought to offend both camps, though I happen to think it's quite clever. MJ On 2010-10-25, at 3:22 PM, Eric Hellman wrote: I think you'd have a very hard time demonstrating any

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Kyle Banerjee
On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding t...@librarything.com wrote: Does processing speed of something matter anymore? You'd have to be doing a LOT of processing to care, wouldn't you? Data migrations and data dumps are a common use case. Needing to break or make hundreds of thousands

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Tim Spalding
Does processing speed of something matter anymore? You'd have to be doing a LOT of processing to care, wouldn't you? Tim On Mon, Oct 25, 2010 at 3:35 PM, MJ Suhonos m...@suhonos.ca wrote: I'll just leave this here: http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records That

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Kyle Banerjee
On Mon, Oct 25, 2010 at 12:22 PM, Eric Hellman e...@hellman.net wrote: I think you'd have a very hard time demonstrating any speed advantage to MARC over MARCXML. XML parsers have been speed optimized out the wazoo; If there exists a MARC parser that has ever been speed-optimized without

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Jonathan Rochkind
Yes, it is designed to be a round-trippable expression of ordinary marc in XML. Some reasons this is useful: 1. No maximum record length, unlike actual marc which tops out at ~10k. 2. You can use XSLT and other XML tools to work with it, and store it in stores optimized for XML (or that only

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Jonathan Rochkind
MODS was an attempt to mostly-but-not-entirely-roundtrippably represent data in MARC in a format that's more 'normal' XML, without packed bytes in elements, with element names that are more or less self-documenting, etc. It's caught on even less than MARCXML though, so if you find MARCXML

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Jonathan Rochkind
Marc in JSON can be a nice middle-ground, faster/smaller than MarcXML (although still probably not as binary), based on a standard low-level data format so easier to work with using existing tools (and developers eyes) than binary, no maximum record length. There have been a couple competing

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Jonathan Rochkind
Tim Spalding wrote: Does processing speed of something matter anymore? You'd have to be doing a LOT of processing to care, wouldn't you? Yes,which sometimes you are. Say, when you're indexing 2 or 3 or 10 million marc records into, say, solr. Which is faster depends on what language and

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread MJ Suhonos
JSON++ I routinely re-index about 2.5M JSON records (originally from binary MARC), and it's several orders of magnitude faster than XML (measured in single-digit minutes rather than double-digit hours). I'm not sure if it's in the same range as binary MARC, but as Tim says, it's plenty fast

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Stephen Meyer
Kyle Banerjee wrote: On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding t...@librarything.com wrote: Does processing speed of something matter anymore? You'd have to be doing a LOT of processing to care, wouldn't you? Data migrations and data dumps are a common use case. Needing to break or

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Ray Denenberg, Library of Congress
@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARCXML - What is it for? Hiya, On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote: Switching to an XML format doesn't help with that at all. I'm willing to take it further and say that MARCXML was the worst thing the library world ever did

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Alexander Johannesen
Ray Denenberg, Library of Congress r...@loc.gov wrote: It really is possible to make your point without being quite so obnoxious. Obnoxious? Alex --  Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps --- http://shelter.nu/blog/

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Bill Dueber
I know there are two parts of this discussion (speed on the one hand, applicability/features on teh other), but for the former, running a little benchmark just isn't that hard. Aren't we supposed to, you know, prefer to make decisions based on data? Note: I'm only testing deserialization because

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Walker, David
for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander Johannesen [alexander.johanne...@gmail.com] Sent: Monday, October 25, 2010 12:38 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARCXML - What is it for? Hiya, On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Eric Lease Morgan
On Oct 25, 2010, at 8:56 PM, Walker, David wrote: Your criticisms of MARC-XML all seem to presume that MARC-XML is the goal, the end point in the process. But MARC-XML is really better seen as a utility, a middle step between binary MARC and the real goal, which is some other useful and

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Alexander Johannesen
On Tue, Oct 26, 2010 at 11:56 AM, Walker, David dwal...@calstate.edu wrote: Your criticisms of MARC-XML all seem to presume that MARC-XML is the goal, the end point in the process.  But MARC-XML is really better seen as a utility, a middle step between binary MARC and the real goal, which is

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Bill Dueber
On Mon, Oct 25, 2010 at 9:32 PM, Alexander Johannesen alexander.johanne...@gmail.com wrote: Lots of people around the library world infra-structure will think that since your data is now in XML it has taken some important step towards being inter-operable with the rest of the world, that

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Alexander Johannesen
On Tue, Oct 26, 2010 at 12:48 PM, Bill Dueber b...@dueber.com wrote: Here, I think you're guilty of radically underestimating lots of people around the library world. No one thinks MARC is a good solution to our modern problems, and no one who actually knows what MARC is has trouble

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Dana Pearson
i'm not a coder but i undertook a study of XML some years after it came onto the scene and with a likely confused notion that it would be the next significant technology, I learned some XSL and later was able to weave PubMed Central journal information (CSV transformed into XML) together with

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Bill Dueber
On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen alexander.johanne...@gmail.com wrote: Political? For sure. Engineering? Not so much. Ok. Solve it. Let us know when you're done. -- Bill Dueber Library Systems Programmer University of Michigan Library

Re: [CODE4LIB] MARCXML - What is it for?

2010-10-25 Thread Bill Dueber
Sorry. That was rude, and uncalled for. I disagree that the problem is easily solved, even without the politics. There've been lots of attempts to try to come up with a sufficiently expressive toolset for dealing with biblio data, and we're still working on it. If you do think you've got some