Eric,
Have you tried checking how MARC::Batch views the encoding?
e.g.
# read write
while ( my $marc = $batch-next ) { print $marc-encoding(); print
$marc-as_usmarc; }
It is supposed to pick up the encoding from 09 in the leader but I am not sure
this is totally reliable. If you know this
Eric--
I'm with Leif. The output you got looks like utf-8 displayed on a terminal that
doesn't support it. Whether you need to fix the terminal display is another
matter--I've never felt compelled to do so.
Anyway, I think you can now sign yourself Eric Did-it-right-the-first-time
Morgan!
Ok, I can't claim to be an expert, but from my own experience, I'd say
Paul is very likely right about double-encoding occuring. However,
the question ends up being where that happens, and in this case I
suspect how MARC::Batch will work could depend heavily on what version
of perl you're running
Hi,
On Wed, Mar 27, 2013 at 7:01 AM, Jon Gorman jonathan.gor...@gmail.comwrote:
One piece of advice is not to trust the terminal directly but pipe
into xxd. (And if possible, just try transforming the offending
record). Or use yaz-marcdump -v, which will also give the hex if I
remember
On Mar 26, 2013, at 5:57 PM, Leif Andersson leif.anders...@sub.su.se wrote:
my first guess would be your terminal is not utf8.
While I'm not positive my terminal is doing UTF-8, I think it is. When I dump
in the beginning the output to the terminal is correct. After I run my script
the
Hi Eric,
On Wed, Mar 27, 2013 at 10:26 AM, Eric Lease Morgan emor...@nd.edu wrote:
While I'm not positive my terminal is doing UTF-8, I think it is. When I
dump in the beginning the output to the terminal is correct. After I run my
script the output to the same terminal is incorrect.
Would
Whenever I see characters like é, I consult this website
http://www.i18nqa.com/debug/bug-utf-8-latin1.html to help me figure out what's
going on. You might find it helpful too.
Shelley
- Original Message -
From: Eric Lease Morgan emor...@nd.edu
To: perl4lib@perl.org
Sent: Tuesday,
A number of people have alluded to the problem of double encoding, and I'm
beginning to think this is true.
I have isolated a number of problem records. They all contain diacritics, but
they do not have an a in position #9 of the leader --
http://dh.crc.nd.edu/tmp/original.marc Can someone
Hi,
On Wed, Mar 27, 2013 at 11:20 AM, Eric Lease Morgan emor...@nd.edu wrote:
I have isolated a number of problem records. They all contain diacritics,
but they do not have an a in position #9 of the leader --
http://dh.crc.nd.edu/tmp/original.marc Can someone verify that the file
contains
On Mar 27, 2013, at 4:59 PM, Eric Lease Morgan emor...@nd.edu wrote:
When it calls as_usmarc, I think MARC::Batch tries to honor the value set in
position #9 of the leader. In other words, if the leader is empty, then it
tries to output records as MARC-8, and when the leader is a value of
I use MarcEdit to view records and check if the mnemonic form of a diacritic
(e.g. {eacute}) appears or not and what the LDR/09 value is. That's the best
way I've come up with so far. MarcEdit is pretty good at guessing what the
character encoding is without relying on the LDR/09 value. I think
Hi,
On Wed, Mar 27, 2013 at 2:11 PM, Eric Lease Morgan emor...@nd.edu wrote:
Put another way, how can I determine whether or not position #9 of a given
MARC leader is accurate? If position #9 is an a, then how can I read the
balance of the record to determine whether or not all the
12 matches
Mail list logo