Whenever I see characters like é, I consult this website 
http://www.i18nqa.com/debug/bug-utf-8-latin1.html to help me figure out what's 
going on. You might find it helpful too.

Shelley

----- Original Message -----
> From: "Eric Lease Morgan" <emor...@nd.edu>
> To: perl4lib@perl.org
> Sent: Tuesday, March 26, 2013 1:22:03 PM
> Subject: reading and writing of utf-8 with marc::batch
> 
> 
> For the life of me I can't figure out how to do reading and writing
> of UTF-8 with MARC::Batch.
> 
> I have a UTF-8 encoded file of MARC records. Dumping the records and
> greping for a particular string illustrates the validity:
> 
>   $ marcdump und.marc | grep Sainte-Face
>   und.marc
>   1000 records
>   2000 records
>   3000 records
>   4000 records
>   5000 records
>   6000 records
>   7000 records
>   8000 records
>   9000 records
>   10000 records
>   11000 records
>   12000 records
>   245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
>   610 20 _aArchiconfrérie de la Sainte-Face
>   13000 records
>   $
> 
> I then run a Perl script that simply reads each record and dumps it
> to STDOUT. Notice how I define both my input and output as UTF-8:
> 
>   #!/shared/perl/current/bin/perl
> 
>   # configure
>   use constant MARC => './und.marc';
> 
>   # require
>   use strict;
>   use MARC::Batch;
> 
>   # initialize
>   binmode ( MARC, ":utf8" );
>   my $batch = MARC::Batch->new( 'USMARC', MARC );
>   $batch->strict_off;
>   $batch->warnings_off;
>   binmode( STDOUT, ":utf8" );
> 
>   # read & write
>   while ( my $marc = $batch->next ) { print $marc->as_usmarc }
> 
>   # done
>   exit;
> 
> But my output is munged:
> 
>   $ ./marc.pl > und.mrc
>   $ marcdump und.mrc | grep Sainte-Face
>   und.mrc
>   1000 records
>   2000 records
>   3000 records
>   4000 records
>   5000 records
>   6000 records
>   7000 records
>   8000 records
>   9000 records
>   10000 records
>   11000 records
>   12000 records
>   245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
>   610    _aArchiconfrérie de la Sainte-Face
>   13000 records
>   $
> 
> What am I doing wrong!?
> 
> --
> Eric Lease Morgan
> University of Notre Dame
> 
> 574/631-8604
> 
> 
> 
> 

-- 
Shelley Doljack  
E-Resources Metadata Librarian 
Metadata Department
Stanford University Libraries
sdolj...@stanford.edu
650-725-0167

Reply via email to