Hi Jackie,

On Tue, Feb 19, 2008 at 10:49 AM, Shieh, Jackie <[EMAIL PROTECTED]> wrote:
> What I have is an Excel spreadsheet for dissertations which I have saved as
> a tab delimited file (examining the file in TextPad, the diacritics appears
> to be fine), then read in and output the file as a utf-8 MARC file. I
> <print> title field confirming author field that contains diacritics with
> the title showing proper indicator values.

It looks like your input file is in ISO-8859-1.  In order to have Perl
do the character conversion to UTF-8, you can either assert that the
input filehandle is in the ISO-8859-1 character set by doing this:

use Encode;
binmode IN, ":encoding(iso-8859-1)";

or do the conversion explicitly like this:

use Encode;
my $converted_line = decode("iso-8859-1", $line);

By not doing the conversion before adding data from the string to the
MARC::Record object, MARC::Record->as_usmarc() builds the MARC blob
from ISO-8859-1 strings and calculates the record length for the
leader by counting the resulting number of bytes.  The resulting MARC
blob is an ISO-8859-1 string, with Perl's internal utf8 flag turned
off.

However, during the print to your output filehandle, Perl
automatically converts the MARC blob string from ISO-8859-1 to UTF-8
because the output filehandle is binmode :utf8, resulting in the
discrepancy between Leader/00-04 and the length of the output MARC
blob.

Regards,

Galen
-- 
Galen Charlton
Koha Application Developer
LibLime
[EMAIL PROTECTED]
p: 1-888-564-2457 x709

Reply via email to