many kudos for the quick response, and the valuable advice
i get many many times this...
element datafield: Schemas validity error : Element '{
http://www.loc.gov/MARC21/slim}datafield': This element is not expected.
Expected is ( {http://www.loc.gov/MARC21/slim}leader ).
2015-08-27 16:52
Hi,
On Thu, Aug 27, 2015 at 9:22 AM, Sergio Letuche code4libus...@gmail.com wrote:
How one could make fron an marcxml record an mrc?
what could be the yaz-marcdump command for this? We have utf-8 iso2709
records
To convert from ISO2709 to MARCXML
yaz-marcdump -i marc -o marcxml INPUTFILE
Hi,
On Thu, Aug 27, 2015 at 9:41 AM, Sergio Letuche code4libus...@gmail.com wrote:
?xml version=1.0 encoding=UTF-8?
collection xmlns=http://www.loc.gov/MARC21/slim;
record
OK, that's what I'd expect from a MARC21slim MARCXML file. Since you
say the file is valid, try comparing it against
Terry Reese created Marc Edit which is a great program and will do just
what you need.
http://marcedit.reeset.net/
Mark Sullivan
Executive Director
IDS Project
Milne Library
1 College Circle
SUNY Geneseo
Geneseo, NY 14454
(585) 245-5172
On 8/27/2015 9:22
?xml version=1.0 encoding=UTF-8?
collection xmlns=http://www.loc.gov/MARC21/slim;
record
2015-08-27 16:40 GMT+03:00 Galen Charlton g...@esilibrary.com:
Hi,
On Thu, Aug 27, 2015 at 9:36 AM, Sergio Letuche code4libus...@gmail.com
wrote:
and i get
yaz_marc_read_xml failed
the marcxml
Hi,
On Thu, Aug 27, 2015 at 10:03 AM, Sergio Letuche
code4libus...@gmail.com wrote:
many kudos for the quick response, and the valuable advice
i get many many times this...
element datafield: Schemas validity error : Element '{
http://www.loc.gov/MARC21/slim}datafield': This element is not
There are probably a couple of answers to that.
XML rules define what characterset is used. The encoding attribute on
the ?xml? header is where you find out what characterset is being
used.
I've always gone under the assumption that if an encoding wasn't
specified, then UTF-8 is in effect and
What's the legal thing to do? What's actually found 'in the wild' with
MarcXML?
In some cases, invalid XML.
In an ideal world, the encoding should be included in the declaration. But
I wouldn't trust it.
kyle
--
--
Kyle Banerjee
So what if the ?xml? decleration says one charset encoding, but the
MARC header included in the MarcXML says a different encoding... which
one is the 'legal' one to believe?
Is it legal to have MarcXML that is not UTF-8 _or_ Marc8, that is an
entirely different charset that is legal in XML?
On 4/17/2012 1:57 PM, Kyle Banerjee wrote:
In some cases, invalid XML. In an ideal world, the encoding should be
included in the declaration. But I wouldn't trust it. kyle
So would you use the Marc header payload instead?
Or you're just saying you wouldn't trust _any_ encoding declerations
Okay, maybe here's another way to approach the question.
If I want to have a MarcXML document encoded in Marc8 -- what should it
look like? What should be in the XML decleration? What should be in the
MARC header embedded in the XML? Or is it not in fact legal at all?
If I want to have a
If I want to have a MarcXML document encoded in Marc8 -- what should
it
look like? What should be in the XML decleration? What should be in
the
MARC header embedded in the XML? Or is it not in fact legal at all?
I'm going out on a limb here, but I don't think it is legal. There is
no
,Ralph
Sent: Tuesday, April 17, 2012 12:51 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MarcXML and char encodings
There are probably a couple of answers to that.
XML rules define what characterset is used. The encoding attribute on
the ?xml? header is where you find out what
Thanks, this is helpful feedback at least.
I think it's completely irrelevant, when determining what is legal under
standards, to talk about what certain Java tools happen to do though, I
don't care too much what some tool you happen to use does.
In this case, I'm _writing_ the tools. I want
] On Behalf Of
Jonathan Rochkind
Sent: Tuesday, April 17, 2012 2:46 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MarcXML and char encodings
Thanks, this is helpful feedback at least.
I think it's completely irrelevant, when determining what is legal under
standards, to talk about what
Jonathan Rochkind
Sent: Tuesday, April 17, 2012 14:18
Subject: Re: [CODE4LIB] MarcXML and char encodings
Okay, maybe here's another way to approach the question.
If I want to have a MarcXML document encoded in Marc8 -- what should it
look like? What should be in the XML decleration
So would you use the Marc header payload instead?
Or you're just saying you wouldn't trust _any_ encoding declerations you
find anywhere?
This.
The short version is that too many vendors and systems just supply some
value without making sure that's what they're spitting out. I haven't had
The discussions at the MARC standards group relating to Unicode all had
to do with using Unicode *within* ISO2709. I can't find any evidence
that MARCXML ever went through the standards process. (This may not be a
bad thing.) So none of what we know about the MARBI discussions and
resulting
Karen Coyle
Sent: Tuesday, April 17, 2012 15:41
Subject: Re: [CODE4LIB] MarcXML and char encodings
The discussions at the MARC standards group relating to Unicode all had
to do with using Unicode *within* ISO2709. I can't find any evidence
that MARCXML ever went through the standards
: Re: [CODE4LIB] MarcXML and char encodings
From: Jonathan Rochkind rochk...@jhu.edu
To: CODE4LIB@LISTSERV.ND.EDU
CC:
Thanks, this is helpful feedback at least.
I think it's completely irrelevant, when determining what is legal under
standards, to talk about what certain Java tools happen
On 4/17/2012 3:01 PM, Sheila M. Morrissey wrote:
No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8 in
the XML prolog,
Wait, how canyou declare a Marc8 encoding in an XML
decleration/prolog/whatever it's called?
The things that appear there need to be from a
No -- it is perfectly legal - -but you MUST declare the encoding to
BE Marc8 in the XML prolog,
Wait, how canyou declare a Marc8 encoding in an XML
decleration/prolog/whatever it's called?
Nope, you can't do that. There is no approved name for the MARC-8
encoding. As Andy said, the closest
[mailto:rochk...@jhu.edu]
Sent: Tuesday, April 17, 2012 4:19 PM
To: Code for Libraries
Cc: Sheila M. Morrissey
Subject: Re: [CODE4LIB] MarcXML and char encodings
On 4/17/2012 3:01 PM, Sheila M. Morrissey wrote:
No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8
in the XML
MARC-8. Cool in its time. Dumb now. Typical. --ELM
[mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
LeVan,Ralph
Sent: Tuesday, April 17, 2012 4:21 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MarcXML and char encodings
No -- it is perfectly legal - -but you MUST declare the encoding to
BE Marc8 in the XML prolog,
Wait, how canyou
I'm going to guess that it's because 59x fields are defined for local use:
http://www.loc.gov/marc/bibliographic/bd59x.html
...but someone from LC should be able to confirm.
-Jon
--
Jon Stroop
Metadata Analyst
Firestone Library
Princeton University
Princeton, NJ 08544
Email:
: Thursday, May 19, 2011 11:07 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML to MODS: 590 Field
I'm going to guess that it's because 59x fields are defined for local use:
http://www.loc.gov/marc/bibliographic/bd59x.html
...but someone from LC should be able to confirm
for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of
Nate Vack
Sent: Friday, November 19, 2010 12:34 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?
On Mon, Oct 25, 2010 at 2:22 PM, Eric Hellman e...@hellman.net wrote:
I think you'd have a very hard time
On 11 Nov 2010, at 14:47, Galen Charlton wrote:
Hi,
On Thu, Nov 11, 2010 at 6:26 AM, J.D.Gravestock
j.d.gravest...@open.ac.uk wrote:
I'd be interested to know if anyone is using a good marcxml to marc
converter (other than marcedit, i.e. non windows). I've tried the
perl module marc::xml
: Re: [CODE4LIB] marcxml
There actually is a version of MARCEdit for Linux now. I think
(although I can't remember and can't find it on the site) that it
relies on Mono.
MARCEdit download page:
http://people.oregonstate.edu/~reeset/marcedit/html/downloads.htmlhttp://people.oregonstate.edu
On Nov 11, 2010, at 6:26 AM, J.D.Gravestock wrote:
I'd be interested to know if anyone is using a good marcxml to marc converter
(other than marcedit, i.e. non windows).
If I understand your question correctly, then try Index Data's yaz-marcdump
application which is a component of Yaz. [1]
There actually is a version of MARCEdit for Linux now. I think (although I
can't remember and can't find it on the site) that it relies on Mono.
MARCEdit download page:
http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html
Joel
-Original Message-
From: Code for Libraries
:40 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] marcxml
There actually is a version of MARCEdit for Linux now. I think
(although I can't remember and can't find it on the site) that it
relies on Mono.
MARCEdit download page:
http://people.oregonstate.edu/~reeset/marcedit/html
I've only just had a chance to catch up on this thread. I'm not
offended in the least by Turbomarc (anything round-trippable should
serve just as well as an internal representation of MARC, right?), but I
am a little puzzled--what are the 'special cases' alluded to in the blog
post? When
Let me openly state that I've never used Turbomarc. I believe the special
case they are referring to is the subfield code with a value of η, which is
non-alphanumeric. I don't know enough about MARC to even begin guessing what
this means or why it might occur (or not).
The use case I see for
On 28 October 2010 17:37, MJ Suhonos m...@suhonos.ca wrote:
Let me openly state that I've never used Turbomarc. I believe the special
case they are referring to is the subfield code with a value of η, which
is non-alphanumeric. I don't know enough about MARC to even begin guessing
what
The first comment claims a 30-40% increase in XML parsing, which seems
obvious when you compare the number of characters in the example provided:
277 vs. 419, or about 34% fewer going through the parser.
The speedup can be much greater than that -- from the blog post
itself, Using
Johannesen [alexander.johanne...@gmail.com]
Sent: Monday, October 25, 2010 7:10 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?
On Tue, Oct 26, 2010 at 12:48 PM, Bill Dueber b...@dueber.com wrote:
Here, I think you're guilty of radically underestimating lots
[mailto:code4...@listserv.nd.edu] On Behalf Of
Walker, David
Sent: Monday, October 25, 2010 8:57 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?
b) expanding it to be actual useful and interesting.
But here I think you've missed the very utility of MARC-XML.
Let's say you
But it looks just like the old thing using insert data scheme and some
templates?
Ah yes, but now we're doing it in XML!
I think this applies to 90% of instances where XML was adopted, especially
within the enterprise IT industry. Through marketing or misunderstanding,
XML was
Alex,
I think the problem is data like this:
http://lccn.loc.gov/96516389/marcxml
And while we can probably figure out a pattern to get the semantics
out this record, there is no telling how many other variations exist
within our collections.
So we've got lots of this data that is both hard to
This is no justification for not doing things better. (And I'd love to
know what the hard bits are; always interesting to hear from various
people as to what they think are the *real* problems of library
problems, as opposed to any other problem they have)
The problem is you have to deal
Hi,
On Tue, Oct 26, 2010 at 1:23 PM, Bill Dueber b...@dueber.com wrote:
Sorry. That was rude, and uncalled for. I disagree that the problem is
easily solved, even without the politics. There've been lots of attempts to
try to come up with a sufficiently expressive toolset for dealing with
On Tue, 2010-10-26 at 03:32 +0200, Alexander Johannesen wrote:
Here's our new thing. And we did it by simply converting all our
MARC into MARCXML that runs on a cron job every midnight, and a bit of
horrendous XSLT that's impossible to maintain.
I am in the development department of our
I think:
1. Marc must die. It has lived long enough.
2. But everybody uses Marc (which is in fact good), too many people are keeping
it alive.
3. MARC in XML does not solve the problem, but it makes the suffering so much
less painful
Peter
Political? For sure. Engineering? Not so much.
Ok. Solve it. Let us know when you're done.
Wow, lamest reply so far. Surely you could muster a tad bit better? I
was excited about getting a list of the hardest problems, for example,
I'd love to see that. Then by that perhaps you could explain
I'd suspect that MARCXML isn't going anywhere fast, a shame perhaps.
The key difference between MARCXML and MARC is that MARCXML inherits
XMLs internationalisation features.
It is an aspect at which MARC is very poor.
Andrew
--
Andrew Cunningham
Senior Project Manager, Research and
On Oct 25, 2010, at 10:31 PM, Alexander Johannesen wrote:
Political? For sure. Engineering? Not so much.
Ok. Solve it. Let us know when you're done.
Wow, lamest reply so far. Surely you could muster a tad bit better? I
was excited about getting a list of the hardest problems, for example,
Of Smith,Devon
[smit...@oclc.org]
Sent: Tuesday, October 26, 2010 7:44 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?
One way is to first transform the MARC into MARC-XML. Then you can
use XSLT to crosswalk the MARC-XML
into that other schema. Very handy.
Your
MARC records break parsing far too frequently. Apart from requiring no
truly specialized tools, MARCXML should—should!—eliminate many of
those problems. That's not to mention that MARC character sets vary a
lot (DanMARC anyone?), and more even in practice than in theory.
From my perspective the
I'm not a big user of MARCXML, but I can think of a few reasons off the top of
my head:
- Existing libraries for reading, manipulating and searching XML-based
documents are very mature.
- Documents can be validated for their well-formedness using these existing
tools and a pre-defined schema
Dear Nate,
There is a trade-off: do you want very fast processing of data - go for binary
data. do you want to share your data globally easily in many (not per se
library related) environments - go for XML/RDF.
Open your data and do both :-)
Pat
Sent from my iPhone
On 25 Oct 2010, at 20:39,
It's helpful to think of MARCXML as a sort of lingua franca.
- Existing libraries for reading, manipulating and searching XML-based
documents are very mature.
Including XSLT and XPath; very powerful stuff.
There's nothing stopping you from reading the MARCXML into a binary blob and
working
- XML is self-describing, binary is not.
Not to quibble, but that's only in a theoretical sense here. Something
like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
At least MARC records kinda imitate catalog cards.
:)
Tim
On Mon, Oct 25, 2010 at 2:50 PM, Andrew Hankinson
On Monday, October 25, 2010 1:50 PM, Andrew Hankinson wrote:
- Documents can be validated for their well-formedness using these existing
tools and a pre-defined schema (a validator for MARC would need to be
custom-coded)
In Perl, MARC::Lint might be an example of such a validator (though I
I think you'd have a very hard time demonstrating any speed advantage to MARC
over MARCXML. XML parsers have been speed optimized out the wazoo; If there
exists a MARC parser that has ever been speed-optimized without serious
compromise, I'm sure someone on this list will have a good story
On Mon, Oct 25, 2010 at 2:09 PM, Tim Spalding t...@librarything.com wrote:
- XML is self-describing, binary is not.
Not to quibble, but that's only in a theoretical sense here. Something
like Amazon XML is truly self-describing. MARCXML is self-obfuscating.
At least MARC records kinda imitate
Hiya,
On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote:
Switching to an XML format doesn't help with that at all.
I'm willing to take it further and say that MARCXML was the worst
thing the library world ever did. Some might argue it was a good first
step, and that it was better
I guess what I meant is that in MARCXML, you have a datafield element with
subsequent subfield elements each with fairly clear attributes, which, while
not my idea of fun Sunday-afternoon reading, requires less specialized tools to
parse (hello Textmate!) and is a bit easier than trying to
I'll just leave this here:
http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records
That trade-off ought to offend both camps, though I happen to think it's quite
clever.
MJ
On 2010-10-25, at 3:22 PM, Eric Hellman wrote:
I think you'd have a very hard time demonstrating any
On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding t...@librarything.com wrote:
Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?
Data migrations and data dumps are a common use case. Needing to break or
make hundreds of thousands
Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?
Tim
On Mon, Oct 25, 2010 at 3:35 PM, MJ Suhonos m...@suhonos.ca wrote:
I'll just leave this here:
http://www.indexdata.com/blog/2010/05/turbomarc-faster-xml-marc-records
That
On Mon, Oct 25, 2010 at 12:22 PM, Eric Hellman e...@hellman.net wrote:
I think you'd have a very hard time demonstrating any speed advantage to
MARC over MARCXML. XML parsers have been speed optimized out the wazoo; If
there exists a MARC parser that has ever been speed-optimized without
Yes, it is designed to be a round-trippable expression of ordinary marc
in XML. Some reasons this is useful:
1. No maximum record length, unlike actual marc which tops out at ~10k.
2. You can use XSLT and other XML tools to work with it, and store it in
stores optimized for XML (or that only
MODS was an attempt to mostly-but-not-entirely-roundtrippably represent
data in MARC in a format that's more 'normal' XML, without packed bytes
in elements, with element names that are more or less self-documenting,
etc. It's caught on even less than MARCXML though, so if you find
MARCXML
Marc in JSON can be a nice middle-ground, faster/smaller than MarcXML
(although still probably not as binary), based on a standard low-level
data format so easier to work with using existing tools (and developers
eyes) than binary, no maximum record length.
There have been a couple competing
Tim Spalding wrote:
Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?
Yes,which sometimes you are. Say, when you're indexing 2 or 3 or 10
million marc records into, say, solr.
Which is faster depends on what language and
JSON++
I routinely re-index about 2.5M JSON records (originally from binary MARC), and
it's several orders of magnitude faster than XML (measured in single-digit
minutes rather than double-digit hours). I'm not sure if it's in the same
range as binary MARC, but as Tim says, it's plenty fast
Kyle Banerjee wrote:
On Mon, Oct 25, 2010 at 12:38 PM, Tim Spalding t...@librarything.com wrote:
Does processing speed of something matter anymore? You'd have to be
doing a LOT of processing to care, wouldn't you?
Data migrations and data dumps are a common use case. Needing to break or
@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?
Hiya,
On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote:
Switching to an XML format doesn't help with that at all.
I'm willing to take it further and say that MARCXML was the worst thing the
library world ever did
Ray Denenberg, Library of Congress r...@loc.gov wrote:
It really is possible to make your point without being quite so obnoxious.
Obnoxious?
Alex
--
Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/
I know there are two parts of this discussion (speed on the one hand,
applicability/features on teh other), but for the former, running a little
benchmark just isn't that hard. Aren't we supposed to, you know, prefer to
make decisions based on data?
Note: I'm only testing deserialization because
for Libraries [code4...@listserv.nd.edu] On Behalf Of Alexander
Johannesen [alexander.johanne...@gmail.com]
Sent: Monday, October 25, 2010 12:38 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARCXML - What is it for?
Hiya,
On Tue, Oct 26, 2010 at 6:26 AM, Nate Vack njv...@wisc.edu wrote
On Oct 25, 2010, at 8:56 PM, Walker, David wrote:
Your criticisms of MARC-XML all seem to presume that MARC-XML is the goal,
the end point in the process. But MARC-XML is really better seen as a
utility, a middle step between binary MARC and the real goal, which is some
other useful and
On Tue, Oct 26, 2010 at 11:56 AM, Walker, David dwal...@calstate.edu wrote:
Your criticisms of MARC-XML all seem to presume that MARC-XML is the
goal, the end point in the process. But MARC-XML is really better seen as a
utility, a middle step between binary MARC and the real goal, which is
On Mon, Oct 25, 2010 at 9:32 PM, Alexander Johannesen
alexander.johanne...@gmail.com wrote:
Lots of people around the library world infra-structure will think
that since your data is now in XML it has taken some important step
towards being inter-operable with the rest of the world, that
On Tue, Oct 26, 2010 at 12:48 PM, Bill Dueber b...@dueber.com wrote:
Here, I think you're guilty of radically underestimating lots of people
around the library world. No one thinks MARC is a good solution to
our modern problems, and no one who actually knows what MARC
is has trouble
i'm not a coder but i undertook a study of XML some years after it
came onto the scene and with a likely confused notion that it would be
the next significant technology, I learned some XSL and later was able
to weave PubMed Central journal information (CSV transformed into XML)
together with
On Mon, Oct 25, 2010 at 10:10 PM, Alexander Johannesen
alexander.johanne...@gmail.com wrote:
Political? For sure. Engineering? Not so much.
Ok. Solve it. Let us know when you're done.
--
Bill Dueber
Library Systems Programmer
University of Michigan Library
Sorry. That was rude, and uncalled for. I disagree that the problem is
easily solved, even without the politics. There've been lots of attempts to
try to come up with a sufficiently expressive toolset for dealing with
biblio data, and we're still working on it. If you do think you've got some
80 matches
Mail list logo