I know how char encodings work in MARC ISO binary -- the encoding can
legally be either Marc8 or UTF8 (nothing else). The encoding of a
record is specified in it's header. In the wild, specified encodings are
frequently wrong, or data includes weird mixed encodings. Okay!
But what's going on
There are probably a couple of answers to that.
XML rules define what characterset is used. The encoding attribute on
the ?xml? header is where you find out what characterset is being
used.
I've always gone under the assumption that if an encoding wasn't
specified, then UTF-8 is in effect and
What's the legal thing to do? What's actually found 'in the wild' with
MarcXML?
In some cases, invalid XML.
In an ideal world, the encoding should be included in the declaration. But
I wouldn't trust it.
kyle
--
--
Kyle Banerjee
So what if the ?xml? decleration says one charset encoding, but the
MARC header included in the MarcXML says a different encoding... which
one is the 'legal' one to believe?
Is it legal to have MarcXML that is not UTF-8 _or_ Marc8, that is an
entirely different charset that is legal in XML?
On 4/17/2012 1:57 PM, Kyle Banerjee wrote:
In some cases, invalid XML. In an ideal world, the encoding should be
included in the declaration. But I wouldn't trust it. kyle
So would you use the Marc header payload instead?
Or you're just saying you wouldn't trust _any_ encoding declerations
Okay, maybe here's another way to approach the question.
If I want to have a MarcXML document encoded in Marc8 -- what should it
look like? What should be in the XML decleration? What should be in the
MARC header embedded in the XML? Or is it not in fact legal at all?
If I want to have a
If I want to have a MarcXML document encoded in Marc8 -- what should
it
look like? What should be in the XML decleration? What should be in
the
MARC header embedded in the XML? Or is it not in fact legal at all?
I'm going out on a limb here, but I don't think it is legal. There is
no
,Ralph
Sent: Tuesday, April 17, 2012 12:51 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MarcXML and char encodings
There are probably a couple of answers to that.
XML rules define what characterset is used. The encoding attribute on
the ?xml? header is where you find out what
Thanks, this is helpful feedback at least.
I think it's completely irrelevant, when determining what is legal under
standards, to talk about what certain Java tools happen to do though, I
don't care too much what some tool you happen to use does.
In this case, I'm _writing_ the tools. I want
] On Behalf Of
Jonathan Rochkind
Sent: Tuesday, April 17, 2012 2:46 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MarcXML and char encodings
Thanks, this is helpful feedback at least.
I think it's completely irrelevant, when determining what is legal under
standards, to talk about what
Jonathan Rochkind
Sent: Tuesday, April 17, 2012 14:18
Subject: Re: [CODE4LIB] MarcXML and char encodings
Okay, maybe here's another way to approach the question.
If I want to have a MarcXML document encoded in Marc8 -- what should it
look like? What should be in the XML decleration
So would you use the Marc header payload instead?
Or you're just saying you wouldn't trust _any_ encoding declerations you
find anywhere?
This.
The short version is that too many vendors and systems just supply some
value without making sure that's what they're spitting out. I haven't had
The discussions at the MARC standards group relating to Unicode all had
to do with using Unicode *within* ISO2709. I can't find any evidence
that MARCXML ever went through the standards process. (This may not be a
bad thing.) So none of what we know about the MARBI discussions and
resulting
Karen Coyle
Sent: Tuesday, April 17, 2012 15:41
Subject: Re: [CODE4LIB] MarcXML and char encodings
The discussions at the MARC standards group relating to Unicode all had
to do with using Unicode *within* ISO2709. I can't find any evidence
that MARCXML ever went through the standards
: Re: [CODE4LIB] MarcXML and char encodings
From: Jonathan Rochkind rochk...@jhu.edu
To: CODE4LIB@LISTSERV.ND.EDU
CC:
Thanks, this is helpful feedback at least.
I think it's completely irrelevant, when determining what is legal under
standards, to talk about what certain Java tools happen
On 4/17/2012 3:01 PM, Sheila M. Morrissey wrote:
No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8 in
the XML prolog,
Wait, how canyou declare a Marc8 encoding in an XML
decleration/prolog/whatever it's called?
The things that appear there need to be from a
No -- it is perfectly legal - -but you MUST declare the encoding to
BE Marc8 in the XML prolog,
Wait, how canyou declare a Marc8 encoding in an XML
decleration/prolog/whatever it's called?
Nope, you can't do that. There is no approved name for the MARC-8
encoding. As Andy said, the closest
[mailto:rochk...@jhu.edu]
Sent: Tuesday, April 17, 2012 4:19 PM
To: Code for Libraries
Cc: Sheila M. Morrissey
Subject: Re: [CODE4LIB] MarcXML and char encodings
On 4/17/2012 3:01 PM, Sheila M. Morrissey wrote:
No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8
in the XML
MARC-8. Cool in its time. Dumb now. Typical. --ELM
[mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
LeVan,Ralph
Sent: Tuesday, April 17, 2012 4:21 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MarcXML and char encodings
No -- it is perfectly legal - -but you MUST declare the encoding to
BE Marc8 in the XML prolog,
Wait, how canyou
20 matches
Mail list logo