for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Nate Vack
Sent: Friday, November 19, 2010 12:34 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?
On Mon, Oct 25, 2010 at 2:22 PM, Eric Hellman e...@hellman.net wrote:
I think you'd have a very hard time
I've only just had a chance to catch up on this thread. I'm not
offended in the least by Turbomarc (anything round-trippable should
serve just as well as an internal representation of MARC, right?), but I
am a little puzzled--what are the 'special cases' alluded to in the blog
post? When
Let me openly state that I've never used Turbomarc. I believe the special
case they are referring to is the subfield code with a value of η, which is
non-alphanumeric. I don't know enough about MARC to even begin guessing what
this means or why it might occur (or not).
The use case I see for
On 28 October 2010 17:37, MJ Suhonos m...@suhonos.ca wrote:
Let me openly state that I've never used Turbomarc. I believe the special
case they are referring to is the subfield code with a value of η, which
is non-alphanumeric. I don't know enough about MARC to even begin guessing
what
The first comment claims a 30-40% increase in XML parsing, which seems
obvious when you compare the number of characters in the example provided:
277 vs. 419, or about 34% fewer going through the parser.
The speedup can be much greater than that -- from the blog post
itself, Using
Johannesen [alexander.johanne...@gmail.com]
Sent: Monday, October 25, 2010 7:10 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?
On Tue, Oct 26, 2010 at 12:48 PM, Bill Dueber b...@dueber.com wrote:
Here, I think you're guilty of radically underestimating lots
[mailto:code4...@listserv.nd.edu] On Behalf Of
Walker, David
Sent: Monday, October 25, 2010 8:57 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?
b) expanding it to be actual useful and interesting.
But here I think you've missed the very utility of MARC-XML.
Let's say you
But it looks just like the old thing using insert data scheme and some
templates?
Ah yes, but now we're doing it in XML!
I think this applies to 90% of instances where XML was adopted, especially
within the enterprise IT industry. Through marketing or misunderstanding,
XML was
Alex,
I think the problem is data like this:
http://lccn.loc.gov/96516389/marcxml
And while we can probably figure out a pattern to get the semantics
out this record, there is no telling how many other variations exist
within our collections.
So we've got lots of this data that is both hard to
This is no justification for not doing things better. (And I'd love to
know what the hard bits are; always interesting to hear from various
people as to what they think are the *real* problems of library
problems, as opposed to any other problem they have)
The problem is you have to deal
Hi,
On Tue, Oct 26, 2010 at 1:23 PM, Bill Dueber b...@dueber.com wrote:
Sorry. That was rude, and uncalled for. I disagree that the problem is
easily solved, even without the politics. There've been lots of attempts to
try to come up with a sufficiently expressive toolset for dealing with
On Tue, 2010-10-26 at 03:32 +0200, Alexander Johannesen wrote:
Here's our new thing. And we did it by simply converting all our
MARC into MARCXML that runs on a cron job every midnight, and a bit of
horrendous XSLT that's impossible to maintain.
I am in the development department of our
I think:
1. Marc must die. It has lived long enough.
2. But everybody uses Marc (which is in fact good), too many people are keeping
it alive.
3. MARC in XML does not solve the problem, but it makes the suffering so much
less painful
Peter
Political? For sure. Engineering? Not so much.
Ok. Solve it. Let us know when you're done.
Wow, lamest reply so far. Surely you could muster a tad bit better? I
was excited about getting a list of the hardest problems, for example,
I'd love to see that. Then by that perhaps you could explain
I'd suspect that MARCXML isn't going anywhere fast, a shame perhaps.
The key difference between MARCXML and MARC is that MARCXML inherits
XMLs internationalisation features.
It is an aspect at which MARC is very poor.
Andrew
--
Andrew Cunningham
Senior Project Manager, Research and
On Oct 25, 2010, at 10:31 PM, Alexander Johannesen wrote:
Political? For sure. Engineering? Not so much.
Ok. Solve it. Let us know when you're done.
Wow, lamest reply so far. Surely you could muster a tad bit better? I
was excited about getting a list of the hardest problems, for example,
Of Smith,Devon
[smit...@oclc.org]
Sent: Tuesday, October 26, 2010 7:44 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?
One way is to first transform the MARC into MARC-XML. Then you can
use XSLT to crosswalk the MARC-XML
into that other schema. Very handy.
Your
Hi all,
I've just spent the last couple of weeks delving into and decoding a
binary file format. This, in turn, got me thinking about MARCXML.
In a nutshell, it looks like it's supposed to contain the exact same
data as a normal MARC record, except in XML form. As in, it should be
MARC records break parsing far too frequently. Apart from requiring no
truly specialized tools, MARCXML should—should!—eliminate many of
those problems. That's not to mention that MARC character sets vary a
lot (DanMARC anyone?), and more even in practice than in theory.
From my perspective the
I'm not a big user of MARCXML, but I can think of a few reasons off the top of
my head:
- Existing libraries for reading, manipulating and searching XML-based
documents are very mature.
- Documents can be validated for their well-formedness using these existing
tools and a pre-defined schema
Dear Nate,
There is a trade-off: do you want very fast processing of data - go for binary
data. do you want to share your data globally easily in many (not per se
library related) environments - go for XML/RDF.
Open your data and do both :-)
Pat
Sent from my iPhone
On 25 Oct 2010, at 20:39,
It's helpful to think of MARCXML as a sort of lingua franca.
- Existing libraries for reading, manipulating and searching XML-based
documents are very mature.
Including XSLT and XPath; very powerful stuff.
There's nothing stopping you from reading the MARCXML into a binary blob and
working
- XML is self-describing, binary is not.
Not to quibble, but that's only in a theoretical sense here. Something
like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
At least MARC records kinda imitate catalog cards.
:)
Tim
On Mon, Oct 25, 2010 at 2:50 PM, Andrew Hankinson
On Monday, October 25, 2010 1:50 PM, Andrew Hankinson wrote:
- Documents can be validated for their well-formedness using these existing
tools and a pre-defined schema (a validator for MARC would need to be
custom-coded)
In Perl, MARC::Lint might be an example of such a validator (though I
I think you'd have a very hard time demonstrating any speed advantage to MARC
over MARCXML. XML parsers have been speed optimized out the wazoo; If there
exists a MARC parser that has ever been speed-optimized without serious
compromise, I'm sure someone on this list will have a good story
On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding t...@librarything.com wrote:
- XML is self-describing, binary is not.
Not to quibble, but that's only in a theoretical sense here. Something
like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
At least MARC records kinda imitate
Hiya,
On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote:
Switching to an XML format doesn't help with that at all.
I'm willing to take it further and say that MARCXML was the worst
thing the library world ever did. Some might argue it was a good first
step, and that it was better
I guess what I meant is that in MARCXML, you have a datafield element with
subsequent subfield elements each with fairly clear attributes, which, while
not my idea of fun Sunday-afternoon reading, requires less specialized tools to
parse (hello Textmate!) and is a bit easier than trying to
I'll just leave this here:
http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records
That trade-off ought to offend both camps, though I happen to think it's quite
clever.
MJ
On 2010-10-25, at 3:22 PM, Eric Hellman wrote:
I think you'd have a very hard time demonstrating any
On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding t...@librarything.com wrote:
Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?
Data migrations and data dumps are a common use case. Needing to break or
make hundreds of thousands
Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?
Tim
On Mon, Oct 25, 2010 at 3:35 PM, MJ Suhonos m...@suhonos.ca wrote:
I'll just leave this here:
http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records
That
On Mon, Oct 25, 2010 at 12:22 PM, Eric Hellman e...@hellman.net wrote:
I think you'd have a very hard time demonstrating any speed advantage to
MARC over MARCXML. XML parsers have been speed optimized out the wazoo; If
there exists a MARC parser that has ever been speed-optimized without
Yes, it is designed to be a round-trippable expression of ordinary marc
in XML. Some reasons this is useful:
1. No maximum record length, unlike actual marc which tops out at ~10k.
2. You can use XSLT and other XML tools to work with it, and store it in
stores optimized for XML (or that only
MODS was an attempt to mostly-but-not-entirely-roundtrippably represent
data in MARC in a format that's more 'normal' XML, without packed bytes
in elements, with element names that are more or less self-documenting,
etc. It's caught on even less than MARCXML though, so if you find
MARCXML
Marc in JSON can be a nice middle-ground, faster/smaller than MarcXML
(although still probably not as binary), based on a standard low-level
data format so easier to work with using existing tools (and developers
eyes) than binary, no maximum record length.
There have been a couple competing
Tim Spalding wrote:
Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?
Yes,which sometimes you are. Say, when you're indexing 2 or 3 or 10
million marc records into, say, solr.
Which is faster depends on what language and
JSON++
I routinely re-index about 2.5M JSON records (originally from binary MARC), and
it's several orders of magnitude faster than XML (measured in single-digit
minutes rather than double-digit hours). I'm not sure if it's in the same
range as binary MARC, but as Tim says, it's plenty fast
Kyle Banerjee wrote:
On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding t...@librarything.com wrote:
Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?
Data migrations and data dumps are a common use case. Needing to break or
@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?
Hiya,
On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote:
Switching to an XML format doesn't help with that at all.
I'm willing to take it further and say that MARCXML was the worst thing the
library world ever did
Ray Denenberg, Library of Congress r...@loc.gov wrote:
It really is possible to make your point without being quite so obnoxious.
Obnoxious?
Alex
--
Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/
I know there are two parts of this discussion (speed on the one hand,
applicability/features on teh other), but for the former, running a little
benchmark just isn't that hard. Aren't we supposed to, you know, prefer to
make decisions based on data?
Note: I'm only testing deserialization because
for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander
Johannesen [alexander.johanne...@gmail.com]
Sent: Monday, October 25, 2010 12:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?
Hiya,
On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote
On Oct 25, 2010, at 8:56 PM, Walker, David wrote:
Your criticisms of MARC-XML all seem to presume that MARC-XML is the goal,
the end point in the process. But MARC-XML is really better seen as a
utility, a middle step between binary MARC and the real goal, which is some
other useful and
On Tue, Oct 26, 2010 at 11:56 AM, Walker, David dwal...@calstate.edu wrote:
Your criticisms of MARC-XML all seem to presume that MARC-XML is the
goal, the end point in the process. But MARC-XML is really better seen as a
utility, a middle step between binary MARC and the real goal, which is
On Mon, Oct 25, 2010 at 9:32 PM, Alexander Johannesen
alexander.johanne...@gmail.com wrote:
Lots of people around the library world infra-structure will think
that since your data is now in XML it has taken some important step
towards being inter-operable with the rest of the world, that
On Tue, Oct 26, 2010 at 12:48 PM, Bill Dueber b...@dueber.com wrote:
Here, I think you're guilty of radically underestimating lots of people
around the library world. No one thinks MARC is a good solution to
our modern problems, and no one who actually knows what MARC
is has trouble
i'm not a coder but i undertook a study of XML some years after it
came onto the scene and with a likely confused notion that it would be
the next significant technology, I learned some XSL and later was able
to weave PubMed Central journal information (CSV transformed into XML)
together with
On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen
alexander.johanne...@gmail.com wrote:
Political? For sure. Engineering? Not so much.
Ok. Solve it. Let us know when you're done.
--
Bill Dueber
Library Systems Programmer
University of Michigan Library
Sorry. That was rude, and uncalled for. I disagree that the problem is
easily solved, even without the politics. There've been lots of attempts to
try to come up with a sufficiently expressive toolset for dealing with
biblio data, and we're still working on it. If you do think you've got some
49 matches
Mail list logo