reading and writing of utf-8 with marc::batch

Eric Lease Morgan Tue, 26 Mar 2013 13:22:27 -0700

For the life of me I can't figure out how to do reading and writing of UTF-8 
with MARC::Batch.


I have a UTF-8 encoded file of MARC records. Dumping the records and greping 
for a particular string illustrates the validity:

  $ marcdump und.marc | grep Sainte-Face
  und.marc
  1000 records
  2000 records
  3000 records
  4000 records
  5000 records
  6000 records
  7000 records
  8000 records
  9000 records
  10000 records
  11000 records
  12000 records
  245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
  610 20 _aArchiconfrérie de la Sainte-Face
  13000 records
  $ 

I then run a Perl script that simply reads each record and dumps it to STDOUT. 
Notice how I define both my input and output as UTF-8:

  #!/shared/perl/current/bin/perl

  # configure
  use constant MARC => './und.marc';

  # require
  use strict;
  use MARC::Batch;

  # initialize
  binmode ( MARC, ":utf8" );
  my $batch = MARC::Batch->new( 'USMARC', MARC );
  $batch->strict_off;
  $batch->warnings_off;
  binmode( STDOUT, ":utf8" );

  # read & write
  while ( my $marc = $batch->next ) { print $marc->as_usmarc }

  # done
  exit;

But my output is munged:

  $ ./marc.pl > und.mrc
  $ marcdump und.mrc | grep Sainte-Face
  und.mrc
  1000 records
  2000 records
  3000 records
  4000 records
  5000 records
  6000 records
  7000 records
  8000 records
  9000 records
  10000 records
  11000 records
  12000 records
  245 00 _aAnnales de l'ArchiconfrÃ©rie de la Sainte-Face
  610    _aArchiconfrÃ©rie de la Sainte-Face
  13000 records
  $

What am I doing wrong!?

--
Eric Lease Morgan
University of Notre Dame

574/631-8604

reading and writing of utf-8 with marc::batch

Reply via email to