Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-28 Thread Ashley Sanders
Eric, How can I figure out whether or not a MARC record contains ONLY characters from the UTF-8 character set? You can use a regex to check if a string is utf-8. There are various examples floating around the internet. An example is the one here:

Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-27 Thread Eric Lease Morgan
A number of people have alluded to the problem of double encoding, and I'm beginning to think this is true. I have isolated a number of problem records. They all contain diacritics, but they do not have an a in position #9 of the leader -- http://dh.crc.nd.edu/tmp/original.marc Can someone

Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-27 Thread Galen Charlton
Hi, On Wed, Mar 27, 2013 at 11:20 AM, Eric Lease Morgan emor...@nd.edu wrote: I have isolated a number of problem records. They all contain diacritics, but they do not have an a in position #9 of the leader -- http://dh.crc.nd.edu/tmp/original.marc Can someone verify that the file contains

Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-27 Thread Eric Lease Morgan
On Mar 27, 2013, at 4:59 PM, Eric Lease Morgan emor...@nd.edu wrote: When it calls as_usmarc, I think MARC::Batch tries to honor the value set in position #9 of the leader. In other words, if the leader is empty, then it tries to output records as MARC-8, and when the leader is a value of

Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-27 Thread Shelley Doljack
I use MarcEdit to view records and check if the mnemonic form of a diacritic (e.g. {eacute}) appears or not and what the LDR/09 value is. That's the best way I've come up with so far. MarcEdit is pretty good at guessing what the character encoding is without relying on the LDR/09 value. I think

Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-27 Thread Galen Charlton
Hi, On Wed, Mar 27, 2013 at 2:11 PM, Eric Lease Morgan emor...@nd.edu wrote: Put another way, how can I determine whether or not position #9 of a given MARC leader is accurate? If position #9 is an a, then how can I read the balance of the record to determine whether or not all the