Eric,
How can I figure out whether or not a MARC record contains ONLY characters
from the UTF-8 character set?
You can use a regex to check if a string is utf-8. There are various examples
floating around the internet. An example is the one here:
A number of people have alluded to the problem of double encoding, and I'm
beginning to think this is true.
I have isolated a number of problem records. They all contain diacritics, but
they do not have an a in position #9 of the leader --
http://dh.crc.nd.edu/tmp/original.marc Can someone
Hi,
On Wed, Mar 27, 2013 at 11:20 AM, Eric Lease Morgan emor...@nd.edu wrote:
I have isolated a number of problem records. They all contain diacritics,
but they do not have an a in position #9 of the leader --
http://dh.crc.nd.edu/tmp/original.marc Can someone verify that the file
contains
On Mar 27, 2013, at 4:59 PM, Eric Lease Morgan emor...@nd.edu wrote:
When it calls as_usmarc, I think MARC::Batch tries to honor the value set in
position #9 of the leader. In other words, if the leader is empty, then it
tries to output records as MARC-8, and when the leader is a value of
I use MarcEdit to view records and check if the mnemonic form of a diacritic
(e.g. {eacute}) appears or not and what the LDR/09 value is. That's the best
way I've come up with so far. MarcEdit is pretty good at guessing what the
character encoding is without relying on the LDR/09 value. I think
Hi,
On Wed, Mar 27, 2013 at 2:11 PM, Eric Lease Morgan emor...@nd.edu wrote:
Put another way, how can I determine whether or not position #9 of a given
MARC leader is accurate? If position #9 is an a, then how can I read the
balance of the record to determine whether or not all the