On Apr 6, 2011, at 5:39 PM, Jon Gorman wrote:
http://zoia.library.nd.edu/tmp/tor.marc
When debugging any encoding issue it's always good to know:
a) how the records were obtained
b) how have they been manipulated before you
touch them (basically, how many times may
they
XML well-formedness and validity checks can't find badly encoded
characters either -- char data that claims to be one encoding but is
really another, or that has been double-encoded and now means something
different than intended.
There's really no way to catch that but heuristics. All of
On 11 April 2011 16:40, Jonathan Rochkind rochk...@jhu.edu wrote:
XML well-formedness and validity checks can't find badly encoded characters
either -- char data that claims to be one encoding but is really another, or
that has been double-encoded and now means something different than
I'm making headway on my MARC records, but only through the use of brute
force.
I used wget to retrieve the MARC records (as well as associated PDF and text
files) from the
Internet Archive.
I know IA has some bad marc records (and also records w/ bad encoding)
from my experience with
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] utf8 \xC2 does not map to Unicode
I am not familar with that Perl module. But I'm more familiar then I'd want
with char encoding in Marc.
I don't recognize the bytes 0xC2 (there are some bytes I became pathetically
familiar with in past
Ack! While using the venerable Perl MARC::Batch module I get the following
error while trying to read a MARC record:
utf8 \xC2 does not map to Unicode
This is a real pain, and I'm hoping someone here can help me either: 1) trap
this error allowing me to move on, or 2) figure out how to open
I am not familar with that Perl module. But I'm more familiar then I'd
want with char encoding in Marc.
I don't recognize the bytes 0xC2 (there are some bytes I became
pathetically familiar with in past debugging, but I've forgotten em),
but the first things to look at:
1. Is your Marc file
Subject: Re: [CODE4LIB] utf8 \xC2 does not map to Unicode
I am not familar with that Perl module. But I'm more familiar then I'd
want with char encoding in Marc.
I don't recognize the bytes 0xC2 (there are some bytes I became
pathetically familiar with in past debugging, but I've forgotten em
On Apr 6, 2011, at 4:46 PM, LeVan,Ralph wrote:
Ack! While using the venerable Perl MARC::Batch module I get the
following error while trying to read a MARC record:
utf8 \xC2 does not map to Unicode
Can you share the record somewhere? I suspect many of us have tools we
can turn loose
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] utf8 \xC2 does not map to Unicode
I am not familar with that Perl module. But I'm more familiar then I'd want
with char encoding in Marc.
I don't recognize the bytes 0xC2 (there are some bytes I became pathetically
familiar with in past
in the very first record?
Ralph
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
Eric Lease Morgan
Sent: Wednesday, April 06, 2011 4:55 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] utf8 \xC2 does not map to Unicode
On Apr 6, 2011
: [CODE4LIB] utf8 \xC2 does not map to Unicode
I am not familar with that Perl module. But I'm more familiar then I'd want
with char encoding in Marc.
I don't recognize the bytes 0xC2 (there are some bytes I became pathetically
familiar with in past debugging, but I've forgotten em), but the first
I'm not quite convinced that it's marc-8 just because there's \xC2 ;).
If you look at a hex dump I'm seeing a lot of what might be combining
characters. The leader appears to have 'a' in the field to indicate
unicode. In the raw hex I'm seeing a lot of two character sequences
like: 756c 69c3
On 6 April 2011, Eric Lease Morgan wrote:
http://zoia.library.nd.edu/tmp/tor.marc
Happily, Kevin's magic formula recognizes this as MARC!
Bill
--
William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org
14 matches
Mail list logo