I use MarcEdit to view records and check if the mnemonic form of a diacritic 
(e.g. {eacute}) appears or not and what the LDR/09 value is. That's the best 
way I've come up with so far. MarcEdit is pretty good at guessing what the 
character encoding is without relying on the LDR/09 value. I think there are 
some perl modules you could use that "guess" what the encoding is of a 
character but I've never used them. I'm interested in finding out other methods 
(preferably automated) for detecting wrong or mixed character encodings in a 
MARC record. 


----- Original Message -----
> From: "Eric Lease Morgan" <emor...@nd.edu>
> To: perl4lib@perl.org
> Sent: Wednesday, March 27, 2013 2:11:26 PM
> Subject: Re: reading and writing of utf-8 with marc::batch [double encoding]
> On Mar 27, 2013, at 4:59 PM, Eric Lease Morgan <emor...@nd.edu>
> wrote:
> > When it calls as_usmarc, I think MARC::Batch tries to honor the
> > value set in position #9 of the leader. In other words, if the
> > leader is empty, then it tries to output records as MARC-8, and
> > when the leader is a value of "a", it tries to encode the data as
> > UTF-8.
> How can I figure out whether or not a MARC record contains ONLY
> characters from the UTF-8 character set?
> Put another way, how can I determine whether or not position #9 of a
> given MARC leader is accurate? If position #9 is an "a", then how
> can I read the balance of the record to determine whether or not all
> the characters really and truly are UTF-8 encoded?
> --
> Eric "This Is Almost Too Much For Me" Morgan

Reply via email to